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Field of thf Tnv,.rH-i^n 

The Invention concerns genes and methods "for 
5 expressing euJcaryotlc and viral proteins at high levels 
in eiikaryotlc cells. 



BacKqround of th^ jpy^^^.^^^ 
Expression of eukaryotlc gene products in 
prokaryotes is sometimes limited by the presence of 
10 codons that are infrequently used in E. coll. Expression 
of such genes can be enhanced by systematic substitution 
of the endogenous codons with codons overrepresented In 
highly expressed prokaryotic genes (Robinson et al. 
1984). It Is commonly supposed that rare codons cause 
15 pausing of the ribosome, which leads to a failure to 
complete the nascent polypeptide chain and a uncoupling 
Of transcription and translation. The mRNA 3' end of the 
stalled rlbosome is exposed to cellular ribonucleases, 
which decreases the stability of the transcript. 
^° gummarv of the Tnv^nrivn 

The invention features a synthetic gene encoding a 
protein normally expressed in mammalian cells wherein at 
least one non-preferred or less preferred codon in the 
natural gene encoding the mammalian protein has been 
iS replaced by a preferred codon encoding the same amino 
acid. 

Preferred codons are: Ala (gcc) ; Arg (cgc) ; Asn 
(aac); Asp (gac) Cys (tgc) ; Gin (cag) ; Gly (ggc) ; His 
(cac); He (ate); Leu (ctg) ; Lys (aag) ; Pro (ccc) ; Phe 
(ttc) ; ser (age) ; Thr (ace) ; Tyr (tac) ; and Val (gtg) . 
Less preferred codons are: Gly (ggg) ; He (att) ; Leu 
(etc) ; ser (tec) ; Val (gtc) . All codons which do not fit 
the description of preferred codons or less preferred 
codons are non-preferred podons. 



30 
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By protein normally expressed in laaminalian cells 
is meant a protein which is expressed in mammalian under 
natural conditions. The term includes genes in the 
mammalian genome such as Factor VIII, Factor IX i 
5 interleukins, and other proteins. The term also includes 
genes which are expressed in a mammalian cell under 
disease conditions such as oncogenes as well as genes 
which are encoded by a virus (including a retrovirus) 
which are expressed in mammalian cells post-infection 

10 In preferred embodiments, the synthetic gene is 

capable of expressing said mammalian protein at a level 
which is at least 110%, 150%, 200%, 500%, 1,000%, or 
10,000% of that expressed by said natural gene in an in 
vitro mammalian cell culture system under identical 

15 conditions (i.e., same cell type, same culture 
conditions, same expression vector) . 

Suitable cell culture systems for measuring 
expression of the synthetic gene and corresponding 
natural gene are described below. Other suitable 

20 expression systems employing mammalian cells are well 
known to those skilled in the art and are described in, 
for example, the standard molecular biology reference 
works noted below. Vectors suitable for expressing the 
synthetic and natural genes are described below and in 

25 the standard reference works described below. By 

"expression** is meant protein expression. Expression can 
be measured using an antibody specific for the protein of 
interest. Such antibodies and measurement techniques are 
well known to those skilled in the art. By "natural 

3 0 gene" is meant the gene sequence which naturally encodes 
the protein. 

In other preferred embodiments at least 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, or 90% of the codons in the 
natural gene are non-preferred codons. 
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preferred e„b<,di.e„t the protein ie „ „v p„te!n x„ 
5 other preferred .-bodiments the protein u IJ ^l 
or „x.o. xn other pre.eLd elo^M'/^; 
protein is a human protein. 

^ The invention also features a method for preparina 
a synthetic gene encoding a protein normally eLrIs!!I7 
10 mammalian cells. The method includes identifying"::" ' 
preferred and less-preferred codons in the natural gene 
encoding the protein and replacing one or more of the 
non-preferred and less-preferred codons with a pref^red 
codon encoding the same amino acid as the repl,:::"::::. 

under some circumstances (e.g., to permit 
introduction of a restriction site, it may be desirable 
to replace a non-preferred codon with a less ,rlZZ 
codon rather than a preferred codon. 
20 " necessary to replace all less preferred 

exp::::r::™ ^-^^"^^ '-^--^^ 

expression can be accomplished even with partial 
replacement. 

in other preferred embodiments the invention 
features vectors (including expression vectors, 
25 comprising the synthetic gene. 

By "vector- is meant a DNA molecule, derived 
e.g., from a plasmid, bacteriophage, or mammalian or 

oTc'wd'": ''''' '"^^""^ ^ ted 

or Cloned, a vector will contain one or more unique 

'0 restriction sites and may be capable of autonomous 

replication in a defined host or vehicle organism such 

that the cloned sequence is reproducible. Thus by 

"expression vector" is meant any autonomous element 

capable of directing the synthesis of a protein, such 
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DNA expression vectors include namiDalian plasmids and 
viruses. 

The invention also features synthetic gene 
fragments which encode a desired portion of the protein. 
5 Such synthetic gene fragments are similar to the 

synthetic genes of the invention except that they encode 
only a portion of the protein. Such gene fragments 
preferably encode at least 50, 100, ISO, or 500 
contiguous amino acids of the protein. 
10 In constructing the synthetic genes of the 

invention it may be desirable to avoid CpG sequences as 
these sequences may cause gene silencing. 

The codon bias present in the HIV gpl20 envelope 
gene is also present in the gag and pol proteins. Thus, 
15 replacement of a portion of the non-preferred and less 
preferred codons found in these genes with preferred 
codons should produce a gene capable of higher level 
expression. A large fraction of the codons in the human 
genes encoding Factor VIII and Factor IX are non- 
20 preferred codons or less preferred codons. Replacement 
of a portion of these codons with preferred codons should 
yield genes capable of higher level expression in 
mammalian cell culture. Conversely, it may be desirable 
to replace preferred codons in a naturally occurring gene 
2 5 with less-preferred codons as a means of lowering 
expression. 

Standard reference works describing the general 
principles of recombinant DNA technology include Watson, 
J.D. et al.. Molecular Bioloov of the Gene. Volumes I and 

30 II, the Benjamin/ Cummings Publishing Company, Inc., 

publisher, Menlo Park, CA (1987); Darnell, J.E. et al., 
Moleculay gell Bioloov . Scientific American Books, Inc., 
Publisher, New York, N.Y. (1986); Old, R.W., et al.. 
Principles of Gene Maninulation: An IntrP^WCtign 

35 Genetic Engineering . 2d edition. University of California 
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Pr... publisher, Berk.l.y, cA ,1981,; „„i«i., t., « 

cold spring Harbor Laboratory, publish.r, CoW Sprlno 
Harbor, NV „a crr... , ".^"^ 

DeseriHl-4r^n Tf the ^rflVlTnn 
Figure 1 depicts the sequence of the synthetic 
0 gpl20 (SEQ ID no: 34, and a synthetic gpieo (SEQ id no: 
35) gene in which codons have been replaced by those 
found in highly expressed human genes. 

Figure 2 is a schematic drawing of the synthetic 
gpl 0 (Hiv-i gene. The shaded portions mar Jd vi to 
> v5 indicate hypervariable regions. The filled box 

indicates the CD4 binding site. A limited number of the 
unique restriction sites ares shown: H (Hind3,, Nh 
(Nhel), P (Pstl). Na (Nael), M (Mlui, , R (EcoRI, , a 
(Agel) and No (Notl, . The chemically synthesized DNA 
fragments which served as PGR templates are shown below 
the gpi20 sequence, along with the locations of the 
primers used for their amplification. 

Figure 3 is a photograph of the results of 
transient transfection assays used to measure gpi20 
expression. Gel electrophoresis of immunoprecipitated 
supernatants of 293T cells transfected with plasmids 
expressing gpi20 encoded by the IIIB isolate of HIV-i 
(gpl20Ilib, , by the MN isolate (gpi20mn, , by the MN 
isolate modified by substitution of the endogenous leader 
peptide with that of the CDS antigen (gpl20mnCD5L) , or by 
the chemically synthesized gene encoding the MN variant 
with the human CDSLeader (syngpi20mn) . Supernatants were 
harvested following a 12 hour labeling period 60 hours 
post-transfection and immunoprecipitated with CD4:igGi 
fusion protein and protein A sepharose. 
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Figure 4 is a graph depicting the results of ELISA 
assays used to measure protein levels in supernatants of 
transiently transfected 293T cells. Supernatants of 293T 
cells transfected with plasmids expressing gpU.O encoded 
5 by the IIIB isolate of HIV-1 (gpl20 Illb) , by the MN 
isolate (gpl20mn) , by the MN isolate modified by 
substitution of the endogenous leader peptide with that 
of CDS antigen (gpl20mn CD5L) , or by the chemically 
synthesized gene encoding the MN variant with human CDS 

10 leader (syngpl20mn) were harvested after 4 days and 
tested in a gpl20/CD4 ELISA. The level of gpl20 is 
expressed in ng/ml. 

Figure 5, panel A is a photograph of a gel 
illustrating the results of a immunoprecipitation assay 

15 used to measure expression of the native and synthetic 
gpl20 in the presence of rev in trans and the RRE in cis. 
In this experiment 293T cells were transiently 
transfected by calcium phosphate coprecipitation of 10 fig 
of plasmid expressing: (A) the synthetic gpl20MN sequence 

20 and RRE in cis, (B) the gpl20 portion of HIV-1 IIIB, (C) 
the gpl20 portion of HIV-1 IIIB and RRE in cis, all in 
the presence or absence of rev expression. The RRE 
constructs gpl20IIIbRRE and syngpl20mnRRE were generated 
using an Eagl/Hpal RRE fragment cloned by PCR from a 

25 HIV-1 HXB2 proviral clone. Each gpl20 expression plasmid 
was cotransfected with 10 ftg of either pCMVrev or CDM7 
plasmid DNA. Supernatants were harvested 60 hours post 
transfection, immunoprecipitated with CD4:IgG fusion 
protein and protein A agarose, and run on a 7% reducing 

30 SDS-PAGE. The gel exposure time was extended to allow the 
induction of gpl20IIIbrre by rev to be demonstrated. 
Figure 5, panel B is a shorter exposure of a similar 
experiment in which syngpl20mnrre was cotransfected with 
or without pCMVrev. Figure 5, panel C is a schematic 

35 diagram of the constructs used in panel A. 
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wnn. ^ " ^ comparison of the sequence of the 

wlldtype rat THY-l gene (wt, (SEQ. id. mo: 37, and a 
synthetic rat THY-i gene (env) (SEQ. id. kO: 36^ 

5 ITeT'T 7 ^^""'"'^ the most 

5 prevalent codons found in the HIV-i env gene ' 

ratTHY-i gene. The solid blac)c box denotes the signal 
peptide. The shaded box denotes the sequences in the 
precursor which direct the attachment of a phophatidyl- 
10 xnositol glycan anchor. Unique restriction sites used 
for assembly of the THY-i constructs are mar)ced H 
(Hind3,, M (Mlul), s (Saci, and No (Motl, . The position 
Of the synthetic oligonucleotides employed in the 
construction are shown at the bottom of the figure. 

Figure 8 is a graph depicting the results of flow 
cytometry analysis, m this experiment 293T cells 
transiently transfected with either wildtype rat THY-i 
(dark line,, ratTHY-l with envelope codons (light line, 
or vector only (dotted line, . 293T cells were 
0 transfected with the different expression plasmids by 

-Precipitation and stained with anti- 
ratTHY-i monoclonal antibody 0X7 followed; by a polyclonal 
FITC- conjugated anti-mouse IgG antibody 3 days after 
transfection. 

5 Figure 9, panel A is a photograph of a gel 

Illustrating the results of immunoprecipitation analysis 
Of supernatants of human 293T cells transfected with 
exther syngpi20mn (A, or a construct syngpiJOmn.rTHY-ienv 
Which has the rTHY-ienv gene in the 3' untranslated 

) region of the syngpi20mn gene (B, . The 

syngpi20mn.rTHY-lenv construct was generated by inserting 
a Not! adapter into the blunted Hind3 site of the 
rTHY-ienv plasmid. Subsequently, a 0.5 kb Notl fragment 
containing the rTHY-ienv gene was cloned into the 
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Notl site of the syngpl2 0mn plasmid and tested for 
correct orientation. Supernatants of 35S labelled cells 
were harvested 72 hours post transf ection, precipitated 
with CD4:IgG fusion protein and protein A agarose, and 
5 run on a 7% reducing SDS-PAGE. Figure 9, pan'^l B is a 
schematic diagram of the constructs used in the 
experiment depicted in panel A of this figure. 

Description of the Preferred Embodiments 

construction of a Synthetic apl20 Gene Having Codons 

10 Found in Hiahlv Expressed Human Genes 

A codon frequency table for the envelope precursor 
of the LAV subtype of HIV*-l was generated using software 
developed by the University of Wisconsin Genetics 
Computer Group. The results of that tabulation are 

15 contrasted in Table 1 with the pattern of codon usage by 
a collection of highly expressed human genes. For any 
amino acid encoded by degenerate codons, the most favored 
codon of the highly expressed genes is different from the 
most favored codon of the HIV envelope precursor. 

20 Moreover a simple rule describes the pattern of favored 
envelope codons wherever it applies: preferred codons 
maximize the number of 

adenine residues in the viral RNA. In all cases but one 
this means that the codon in which the third position is 

25 A is the most frequently used. In the special case of 
serine, three codons equally contribute one A residue to 
the mRNA; together these three comprise 85% of the codons 
actually used in envelope transcripts. A particularly 
striking example of the A bias is found in the codon 

30 choice for arginine, in which the AGA triplet comprises 
88% of all codons. In addition to the preponderance of A 
residues, a marked preference is seen for uridine among 
degenerate codons whose third residue must be a 
pyrimidine. Finally, the inconsistencies among the l«ss 
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frequently used variants can be accounted for by the 
observation that the dinucleotide CpG is 
underrepresented; thus the third position is less liKelv 
to be G Whenever the second position is c, as -^n the ' 

Tc^^^"::'":' ^^^-^ -reoninerrnd 

TABLE 1, codon Frequency in the HIV-i iTTh 

and in highly 4xpressld"JLin""es?'' 



10 hlA 



High Bnv ^ 

C 68 16 
T 32 84 



Gc c c?, fla 

TG C 68 16 



15 « 17 5 

^ CA A 12 55 
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27 
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17 


18 


A 


13 


50 


G 


17 


5 
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21 


0 
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10 


88 
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18 
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CG C 37 n G 88 45 



20 

AG 



Aan 

25 AA C 78 30 

T 22 70 

Asp 

GA C 75 33 

T 25 67 

30 



LfiU 

35 CT 



TT 

40 

leza 

AA 



45 
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10 
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17 
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58 


17 
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30 
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20 
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18 
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G 
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32 



GA 
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25 


67 
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33 
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GG 
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13 
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14 


53 
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13 
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22 
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22 
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14 
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££2 

CC 



TT 
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48 


27 


TA 
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74 
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T 


19 


14 
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26 


92 


A 


16 
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5 


62 










G 


64 


18 



15 



established the the University of Wisconsin Genetics 
Computer Group. Numbers represent the percentage of 
cases in which the particular codon is used. Codon usage 
frequencies of envelope genes of other HIV-1 virus 
isolates are comparable and show a similar bias. 



In order to produce a gpl20 gene capable of high 

20 level expression in mammalian cells, a synthetic gene 
encoding the gpl20 segment of HIV*1 was constructed 
(syngpl20mn) , based on the sequence of the most common 
North American subtype, HIV-l MN (Shaw et al. 1984; Gallo 
et al. 1986). In this synthetic gpl20 gene nearly all of 

25 the native codons have been systematically replaced with 
codons most frequently used in highly expressed human 
genes (FIG. 1) . This synthetic gene was assembled from 
chemically synthesized oligonucleotides of 150 to 200 
bases in length. If oligonucleotides exceeding 120 to 

30 150 bases are chemically synthesized, the percentage of 
full-length product can be low, and the vast excess of 
material consists of shorter oligonucleotides. Since 
these shorter fragments inhibit cloning and PGR 
procedures, it can be very difficult to use 

35 oligonucleotides exceeding a certain length. In order to 
use crude synthesis material without prior purification, 
single-stranded oligonucleotide pools were PGR amplified 
before cloning. PGR products were purified in agarose 
gels and used as templates in the next PGR step. Two 
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iZlZillT""' ^ "-.^Ufl... .ecus. „ 

overlappin, ».,uences at th. end of either fra<™.n^ 
^«e rra^ents. „Mc. «ere .et„e.„ 3S0 ana "Hp 
size, were subcloned into a pCDM7-deriv.d Bla..i7 
= containin, the iead.r ..^en=. or the CO ^ 1 

''^ ' ''-eX/Psti/„iu,/.ooKi/Ba:„" 
polyllnlcer. Each of the restriction enzymes in thl. 
PoiyXin^er represent, a sit. that is pres^rnt at eUher 

- Who:. ,ene was .sse.hL°' Z LTL^rT' 

to 6 different clones w.r. suboloned and sequence! prior 
to asse^iy. , 3ch«.atic drawin, of the .ethod used to 
construct th. synthetic ,pi,o is shown in FIc. , 1' 

IZT" °' • =xnt';etlr 

!n"x?T ■^''"-'>' ^= P"«„ted 

Th. mutation rat. was con.id.rabl.. Th. .o.t 
co.»only found mutations were short ,i nucleotide" a„d 
ion, ,up to 30 nucleotides, deletions. I„ so« cas.s it 

a aV:::""'.'" """"" ^''"^ '^nthet 

adapters or pieces fro. oth« subclones without mutation 
in that particular region. Som. deviations fro. strict 
adherence to optimised codon usage w.r. „d. to 
accommodat. th. introduction of r.striction sites into 
the r.sultin, gen. to tacilitat. th. r.placement of 

w."rintroTT.'"°- "«'«tion site, 

".r. introducd into th. ,.n. at approximataly loo bp 

0 ^ithTh T --"^'O 
with th. highly efficient leader peptide of th. human COS 

antigen to facilitate secretion. The plasmid used for 

construction is a derivative of the mammalian expression 

vactor PCDM7 transcribing the inserted gen. und.Tth^ 

control Of a strong human oiv i»m«li.t. ..rly promot.r. 
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To compare the wild-type and synthetic gpl20 
coding sequences, the synthetic gpl20 coding sequence was 
inserted into a aasuQalian expression vector and tested in 
transient transfection assays. Several different native 
5 gpl20 genes were used as controls to exclude variations 
in expression levels between different virus isolates and 
artifacts induced by distinct leader sequences. The 
gpl20 HIV Illb construct used as control was generated by 
PGR using a Sall/Xhol HIV-1 HXB2 envelope fragment as 

10 template. To exclude PGR induced mutations a Kpnl/Earl 
fragment containing approximately 1.2 kb of the gene was 
exchanged with the respective sequence from the proviral 
clone. The wildtype gpl20mn constructs used as controls 
were cloned by PCR from HIV-1 MN infected 08 166 cells 

15 (AIDS Repository, Rockville, MD) and expressed gpl20 
either with a native envelope or a CDS leader sequence. 
Since proviral clones were not available in" this case, 
two clones of each construct were tested to avoid PCR 
artifacts. To determine the amount of secreted gpl20 

20 semi-quantitatively supernatants of 293T cells 
transiently transfected by calcium phosphate 
coprecipitation were immunoprecipitated with soluble 
CD4 : immunoglobulin fusion protein and protein A 
sepharose. 

25 The results of this analysis (FIG. 3) show that 

the synthetic gene product is expressed at a very high 
level compared to that of the native gpl20 controls. The 
molecular weight of the synthetic gpl20 gene was 
comparable to control proteins (FIG. 3) and appeared to 

30 be in the range of ICQ to 110 kd. The slightly faster 
migration can be explained by the fact that in some tumor 
cell lines like 293T glycosylation is either not complete 
or altered to some extent. 

To compare expression more accurately gpl20 

35 protein levels were quantitated using a gpl20 ELISA with 
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r^H^T This analysis shows („« 

4) that ELlSA data were comparable to the 
inmunoprecipitation data, with a rmion ^ 

IM, th«n the baclc,round cutoff (s ng/.l) for all thi 
na-ve ^i,o ,e„es. xnus, e:<p„sslo„ of'tj,;",:: , 
gpl20 gene appears to be at least on* / 

Shown the increase „aa at least 25 fpia. 
" Th9 Roll, nf r.v In r rrrr--i|ii|| 

Since rev appears to exert its effect at several 
steps i„ the expression of a viral transcript, the 
possible role of non-translational effects in the 
15 tester V^""''" °' gen. was 

s lis el" ' '° P°-l"lity that negative 

signals elements conferring either increased mRHA 
degradation or nucleic retention were elLlnated by 
Changing the nucleotide seguence, cytoplasmic mPHA levels 
were tested, cytoplasmic rha was prepared by „P40 l^sU 
20 o transiently transfected «3T cells and subseguent 
elimination of the nuclei by oentrifugation. Cytoplasmic 
««» was subseguently prepared fro. lysates by multiple 
Phenol extractions and precipitation, spotted on 

25 hvbr!r"H'°" ' apparatus, and finally 

25 hybridized with an envelope-specific probe. 

with rn-?^'"*'' «"s transfected 

with COM., gpi20 IIIB. or syn,pl20 was isolated 36 hours 
post transfectlon. cytoplasmic RMA of Hela cells 
infected with wildtyp. vaccinia virus or recombinant 
30 vxrus expressing gpl20 Illb or the synthetic gpl20 gene 
was under the control of the 7.5 promoter was Isolated « 
hour, post infection. Equal amount, were spotted on 
nitrocellulose using a slot blot device and hybridized 
with randomly labelled 1.5 kb gpl20IIIb and syngplJO 
fragments or human beta-actin. RHA expression levels 
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were guantltated by scanning the hybridized membranes 
with a phospoimager. The procedures used are described 
in greater detail below. 

This experiment demonstrated that there was no 
5 significant difference in the mRNA levels of cells 
transfected with either the native or synthetic gpl20 
gene. In fact, in some experiments cytoplasmic mRNA 
level of the synthetic gpl20 gene was even lower than 
that of the native gpl20 gene. 

10 These data were confirmed by' measuring expression 

from recombinant vaccinia viruses. Human 293 cells or 
Hela cells were infected with vaccinia virus expressing 
wildtype gpl20 Illb or syngpl20mn at a multiplicity of 
infection of at least 10. Supernatants were harvested 24 

15 hours post infection and immunoprecipitated with 

CD4 : immunoglobin fusion protein and protein A sepharose. 
The procedures used in this experiment are described in 
greater detail below. 

This experiment showed that the increased 

20 expression of the synthetic gene was still observed when 
the endogenous gene product and the synthetic gene 
product were expressed from vaccinia virus recombinants 
under the control of the strong mixed early and late 1.5k 
promoter. Because vaccinia virus mRKAs are transcribed 

25 and translated in the cytoplasm, increased expression of 
the synthetic envelope gene in this experiment cannot be 
attributed to improved export from the nucleus. This 
experiment was repeated in two additional human cell 
types, the kidney cancer cell line 293 and HeLa cells. 

30 As with transfected 293T cells, mRNA levels were similar 
in 293 cells infected with either recombinant vaccinia 
virus. 
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Because it appears that codon usage has a 
significant i».pact on expression in mamnalian cells, the 
codon frequency in the envelope genes of other" 
5 retroviruses was examined. This study found no clear 
pattern of codon preference between retroviruses in 
general. However, if viruses from the lentivirus genus 
to Which HIV-l belongs to, were analyzed separately 
codon usage bias almost identical to -that of HIV-i was 
10 found. A codon frequency table from the envelope 
glycoproteins of a variety of (predominantly type C) 
retroviruses excluding the lentiviruses was prepared, and 
compared a codon frequency table created from the 
envelope sequences of four lentiviruses not closely 
15 related to HIV-i (caprine arthritis encephalitis virus 
equ.ne infectious anemia virus, feline immunodeficiency 
virus, and visna virus) (Table 2). The codon usage 
pattern for lentiviruses is strikingly similar to that of 
HIV-i, xn all cases but one, the preferred codon for 
20 HIV-i IS the same as the preferred codon for the other 
lentiviruses. The exception is proline, which is encoded 
by CCT in 41% Of non-HIV lentiviral envelope residues 
and by cCA in 40% of residues, a situation which clearly 
also reflects a significant preference for the triplet 
25 ending in A. The pattern of codon usage by the non- 
lentiviral envelope proteins does not show a similar 
predominance of a residues, and is also not as skewed 
toward third position C and G residues as is the codon 
usage for the highly expressed human genes, m general 
30 non-lentiviral retroviruses appear to exploit the 

different codons more equally, a pattern they share with 
less highly expressed human genes. 
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TABLE 2: Codon frequency in the envelope gene of 
lentiviruses (lenti) and non-lentiviral 
retroviruses (other) • 



Other Lenti 

5 hlA 



10 



6C 


C 


45 


13 




T 


26 


37 




A 


20 


46 




G 


9 


3 










CG 


C 


14 


2 




T 


6 


3 




A 


16 


5 




G 


17 


3 


AG 


A 


31 


51 




G 


. 15 


26 


Asn 








AA 


C 


49 


31 




T 


51 


69 


Asp 








GA 


C 


55 


33 




T 


51 


69 



Othar L«Btl 



TG 


C 


S3 


21 




T 




/ If 


Qln 








CA 


A 


52 


69 




G 


48 


31 


GlU' 








GA 


A 


57 


68 




G 


43 


32 


QlV 








GG 


C 


21 


8 




T 


13 


9 




A 


37 


56 




G 


29 


26 


Hla 








CA 


C 


51 


38 




T 


49 


62 



25 



AT 


C 


38 


16 




T 


31 


22 




A 


31 


61 



Leu 



CT 


c 


22 


8 




T 


14 


9 




A 


21 


16 




G 


19 


11 


TT 


A 


15 


41 




G 


10 


16 


Lys 








AA 


A 


60 


63 




G 


40 


37 










cc 


C 


42 


14 




T 


30 


41 




A 


20 


40 




G 


7 


5 



fler 



TC 


c 


38 


10 




T 


17 


16 




A 


18 


24 




G 


6 


5 


AG 


C 


13 


20 




T 


7 


25 


Thr 








AC 


C 


44 


18 




T 


27 


20 




A 


19 


55 




G 


10 


8 


Tvr 








TA 


C 


48 


28 




T 


52 


72 
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TT 



C 
T 



52 
48 



25 
75 



GT C 
T 
A 

G 



36 
17 
22 
25 



9 
10 

-.54 
27 



10 



15 



20 



25 



30 



35 



40 



S°?P;;^« Group. Nuabe« ^IpJLlJt iSr"^^" Genetics 
which a particular codon is usf J Percentage in 

lentiviral retroviruserwaS coSb^.S^'J*'" "^*9e It non- 
precursor sequences of bo"„e envelope 
leukemia virus, human T-«n liu?^ ^^^ine 

fl^ii ly»Photropic Jirus i^pe n JSe^ir? '^^^ ^' 
forming isolate of murine leSk^mT; 5^® °ell focus- 

Rauscher spleen f ocus-f S^im J'^'^f u ^""^^^ ' ^he 
the 4070A amphotropic it^lH i^S JJf' ^^'f ^^^^ isolate, 
leukemia virus isolate and fr-oi^ J®, °^®^°Proliferativ4 
simian sarcoma virus sim?fn J " f?^ leukemia virus? 
leukemogenic ret«Wrus ?itS3/B"ii leukemia virus?' 
virus. The codon frequencj ?^bif:'**'^*'?°" leukemia 
SIV lentiviruses were coSnniS^i®* ^"^^ non-HIV, non- 
precursor sequences ^^^e envelope 
yirus, equinfrSllltf^Js aSei?! Jf"^^^^^- encephalitis 
i-ununodeficiency viriranS";isM^J??G3f«^i- 



pt::L:i -----tion, so th:: :z : rT " 

tri — -eonine 

among the retroviral envelope coding setenlTLTis 
seZce::^^ '™ lentlvirus 

To examine whether regulation bv rev i« ^« 
to Hiv-1 ^ ^ 1^ connected 

HIV 1 codon usage, the influence of rev on the 
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expression of both native and synthetic gene was 
investigated. Since regulation by rev requires the rev- 
binding site RRE in cis, constructs were made in which 
this binding site was cloned into the 3' untranslated 
5 region of both the native and the synthetic gene. These 
plasmids were co-transfected with rev or a control 
plasmid in trans into 293T cells, and gpl20 expression 
levels in supernatants were measured semiquantitatively 
by imiaunoprecipitation. The procedures used in this 

10 experiment are described in greater detail below. 

As shown in FIG. 5, panels A and B, rev 
upregulates the native gpl20 gene, but has no effect on 
the expression of the synthetic gpl20 gene. Thus, the 
action of rev is not apparent on a substrate which lacks 

15 the coding sequence of endogenous viral envelope 
sequences . 

T ;yprgssion of a synthetic rat THY-1 gene with HIV 
envelope codons 

The above-described experiment suggest that in 
20 fact "envelope sequences" have to be present for rev 
regulation. In order to test this hypothesis, a 
synthetic version of the gene encoding the small, 
typically highly expressed cell surface protein, rat 
THY-1 antigen, was prepared. The synthetic version of 
25 the rat THY-1 gene was designed to have a codon usage 

like that of HIV gpl20. In designing this synthetic gene 
AUUUA sequences, which are associated with mRNA 
instability, were avoided. In addition, two restriction 
sites were introduced to simplify manipulation of the 

30 resulting gene (FIG. 6). This synthetic gene with the 
HIV envelope codon usage (rTHY-lenv) was generated using 
three 150 to 170 mer oligonucleotides (FIG. 7) . In 
contrast to the syngpl20mn gene, PGR products were 
directly cloned and assembled in pUC12, and subsequently 

35 cloned into pCDM7. 
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th. HTv ^'"1= ""iv. rTHY-1 and rTHY-l with 

the HIV envelop, codons were quentitated by 
i^unotluoreacnc. ot transiently transfected 293T cell. 
FIG 8 Shows that the expression of the native ray i 

level Of the control transfected cells (pcDM7) i„ 
contrast, expression of the synthetic rat IHY-l i. 
substantially lower than that of the native gene 'shown 

TO prove that no negative sequence elements 
prOBotin, »RNA degradation were inadvertently Introduced 
a construct was generated In which the rT„v-LrgM " ^ 

e^i^er L transfected 
With either the syngpizonn gene or the syngpl20/rat THY-i 

:L"b"v""' ••^-'^""-"•-"-lenv,. .Sre.:i:„ w« 

measured by immunopreoipitation with CD4:lgG fusion 
protein and protein A agarose. The procedure, used in 
to thxs experiment are described in greater detail below 
c»d t'"" synthetic ,pl20 gene has an OAG stop 
codon. rTHV-lenv is not translated from this transcript. 
It negative elements conferring enhanced degradation Le 

5 fr:rL " -xpresseT 
3 rrom this construct should be decraa<:»rf ^ 

aocreased m comparison to 
the syngpi20mn construct without rTHY-ienv. FIG 9 
panel A, shows that the expression of both constructs is 
sxmxlar, indicating that the low expression must be 
linked to translation. 



30 "-sy-det 



35 



gene with envelnpo r.»^>^nfT 

To explore whether rev is able to regulate 
expression of a rat THY-i gene having env codons, a 
construct was made with a rev-binding site in the 3' end 
Of the rTHYlenv open reading frame. To measure rev- 
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responsiveness of the a rat THY*lenv construct having a 
3' RRE, human 293T cells were cotransf ected 
ratTHY-lenvrre and either CDM7 or pCMVrev, At 60 hours 
post transfection cells were detached with 1 mM EDTA in 
5 PBS and stained with the OX-7 anti rTHY-1 mouse 
monoclonal antibody and a secondary FITC-conjugated 
antibody « Fluorescence intensity was measured using a 
EPICS XL cytof luorometer. These procedures are described 
in greater detail below. 

10 In repeated experiments, a slight increase of 

rTHY-lenv expression was detected if rev was 
cotransf ected wit:h the rTHY-lenv gene. To further 
increase the sensitivity of the assay system a construct 
expressing a secreted version of rTHY-lenv was generated. 

15 This construct should produce more reliable data because 
the accumulated amount of secreted protein in the 
supernatant reflects the result of protein production 
over an extended period, in contrast to surface expressed 
protein, which appears to more closely reflect the 

20 current production rate. A gene capable of expressing a 
secreted form was prepared by PGR using forward and 
reverse primers annealing 3' of the endogenous leader 
sequence and 5' of the sequence motif required for 
phosphatidylinositol glycan anchorage respectively. The 

2 5 PGR product was cloned into a plasmid which already 

contained a GD5 leader sequence, thus generating a 
construct in which the membrane anchor has been deleted 
and the leader sequence exchanged by a heterologous (and 
probably more efficient) leader peptide. 
30 The rev-responsiveness of the secreted form 

ratTHY-lenv was measured by immunoprecipitation of 
supernatants of human 293T cells cotransf ected with a 
plasmid expressing a secreted form of ratTHY-lenv and the 
RRE sequence in cis (rTHY-lenvPI-rre) and either GDM7 or 

3 5 pGMVrev. The rTHY-lenvPI-RRE construct was made by PGR 
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using the oligonucleotides 

cgcggggctagcgcaaagagtaataagtttaac as forward and 
cgcggatcccttgtattttgtactaata a as reverse primers and the 
synthetic rTHY-ienv construct as template. After 
digestion with Nhel and Notl the PCR fragment was cloned 
into a plasmid containing CDS leader and rre sequences. 
Supernatants of labelled cells were harvested 72 
hours post transfection, precipitated with a mouse 
monoclonal antibody 0X7 against rTHY-i and anti mouse Igc 
sepharose, and run on a 12% reducing SDS-PAGE. 

in this experiment the induction of rTHY-ienv by 
rev was much more prominent and clearcut than in the 
above-described experiment and strongly suggests that rev 
is able to translationally regulate transcripts that are 
suppressed by low-usage codons. 

To test whether low-usage codons must be present 
throughout the whole coding sequence or whether a short 
region is sufficient to confer rev-responsiveness, a 
rTHY-ienv: immunoglobulin fusion protein was generated, 
in this construct the rTHY-lenv gene (without the 
sequence motif responsible for phosphatidyl inositol 
glycan anchorage) is linked to the human Igci hinge, CH2 
25 and CH3 domains. This construct was generated by anchor 
PCR using primers with Nhel and BamHI restriction sites 
and rTHY-lenv as template. The PCR fragment was cloned 
into a plasmid containing the leader sequence of the CDS 
surface molecule and the hinge, CH2 and CH3 parts of 
human IgGl immunoglobulin. A Hind3/Eagi fragment 
containing the rTHY-lenvegi insert was subsequently 
cloned into a pCDM7-derived plasmid with the rre 
sequence . 

To measure the response of the rTHY-lenv/ 
immunoglobin fusion gene (rTHY-lenvegirre) to rev human 
293T cells cotransfected with rTHY-ienveglrre and either 



20 



30 



35 
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pCDM7 or pCMVrev. The rTHY^lenveglrre construct was made 
by anchor PGR using forward and reverse primers with Nhel 
and BamHl restriction sites respectively. The PGR 
fragment was cloned into a plasmid containing a COS 
5 leader and human IgGl hinge, CH2 and CH3 domains. 

Supernatants of ^^S labelled cells were harvested 72 hours 
post transfection, precipitated with a mouse monoclonal 
antibody 0X7 against rTHY-1 and anti mouse IgG sepharose, 
and run on a 12% reducing SDS-PAGE. The procedures used 

10 are described in greater detail below. 

As with the product of the rTHY-lenvPI- gene, this 
rTHY-lenv/ immunoglobulin fusion protein is secreted into 
the supernatant. Thus, this gene should be responsive to 
rev-induction. However, in contrast to rTHY-lenvPI-, 

15 cotransf ection of rev in trans induced no or only a 
negligible increase of rTHY-lenvegl expression. 

The expression of rTHY-1: immunoglobulin fusion 
protein with native rTHY-*l or HIV envelope codons was 
measured by immunoprecipitation. Briefly, human 293T 

2 0 cells transfected with either rTHY-lenvegl (env codons) 
or rTHY-lwtegl (native codons) . The rTHY-lwtegl 
construct was generated in manner similar to that used 
for the rTHY-lenvegl construct, with the exception that a 
plasmid containing the native rTHY*l gene was used as 

25 template. Supernatants of ^^S labelled cells were 

harvested 72 hours post transf ection, precipitated with a 
mouse monoclonal antibody 0X7 against rTHY-1 and anti 
mouse IgG sepharose, and run on a 12% reducing SOS-PAGE. 
THe procedures used in this experiment are described in 

30 greater detail below. 

Expression levels of rTHY-lenvegl were decreased 
in comparison to a similar construct with wildtype rTHY-l 
as the fusion partner, but were still considerably higher 
than rTHY-lenv. Accordingly, both parts of the fusion 

35 protein influenced expression levels. The addition of 
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rTHY-i«nv did not rofi<-r.<«*. 
appear, to J^lTJUlTV. 

lll^t 'or .11 twenty ..i„o .cid.. o„. 

10 c^!' °' ""isticl Significance of "i. 

resiauet TlJlVT^^TT'- 

Of two ,^ipro:.:i" : o ;..":irrt"":r 

approximately 0.004 and hen.» k " 
1= » r and hence by any conventional 

r.:r" '""^^ ^- ccnXe, 

rando.. rurther evidence of . eXewed codon preference is 
found ..on, the «or, deg.ner.t. codon.. wher! . slrZ 
selection for triplet, beerin, .denin. c.„ L .ee„ ^his 

^0 :Lo:: :::::: r^r ^--r 
-^po.tio„ Of codon.\r— ^^^^ - 

The .yste..tic exch.nge of netive codon. with 
codon. Of highly e:,pre...d hu«„ g.„., dr...tlc.riy 

^ ELlSA Showed th.t expression of the synthetic 

at le..t 35 fold higher in co.p.ri.on to native gp'„o 

after transient tr.n.tection into hu..n ZS2 cllT. The 

wht^h ! K-antlfiction 

Which is h.sed on gpi2o binding to CD4, only „.tive non- 
<.en.tur«, ..t.ri.l w.. detected. This „y expLin ^he 
pparent low expression. Mea.ure.ent of cyto^ias.ic 
levels demonstrated that the difference in protein 



wo 96/09378 



PCrrtJS95/115n 



- 24 - 

expression is due to translational differences and not 
mRNA stability. 

Retroviruses in general do not show a similar 
preference towards A and T as found for HIV. But if this 
5 family was divided into two subgroups, lentivlruses and 
non-lentiviral retroviruses, a similar preference to A 
and, less frequently, T, was detected at the third codon 
position for lentiviruses. Thus, the availing evidence 
suggests that lentiviruses retain a characteristic 
10 pattern of envelope codons not because of an inherent 
advantage to the reverse transcription or replication of 
such residues, but rather for some reason peculiar to the 
physiology of that class of viruses. The major 
difference between lentiviruses and non-complex 
15 retroviruses are additional regulatory and non- 

essentially accessory genes in lentiviruses, as already 
mentioned. Thus, one simple explanation for the 
restriction of envelope expression might be that an 
important regulatory mechanism of one of these additional 
20 molecules is based on it. In fact, it is known that one 
of these proteins, rev, which most likely has homologues 
in all lentiviruses. Thus codon usage in viral mRNA is 
used to create a class of transcripts which is 
susceptible to the stimulatory action of rev. This 
25 hypothesis was proved using a similar strategy as above, 
but this time codon usage was changed into the inverse 
direction. Codon usage of a highly expressed cellular 
gene was substituted with the most frequently used codons 
in the HIV envelope. As assumed, expression levels were 
30 considerably lower in comparison to the native molecule, 
almost two orders of magnitude when analyzed by 
immunofluorescence of the surface expressed molecule (see 
4.7). If rev was coexpressed in trans and a RRE element 
was present in cis only a slight induction was found for 
35 the surface molecule. However, if THY-1 was expressed as 
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a secrated molecule, the induction by rev was murh » 
prominent, supporting the above hypothesis This c^^^^^^ 
probacy be explained by accumulation o. secret" protein 
5 e«ect "^^^^ considerably amplifies the re" 

effect If rev only induces a minor increase for surface 
-olecules in general, induction of HIV envelope by rev 
cannot have the purpose of an increased surface 
abundance but rather of an increased intracellular gpi.o 
level. It xs completely unclear at the moment why this 
10 should be the case. ^ 

ar. °f * gene 

are suffxcxent to restrict expression and render it rev- 
dependent rTHVlenv: immunoglobulin fusion proteins were 

15 12 I T' " ^^^-'^ °' total 

15 gene had the envelope codon usage. Expression levels of 

that 2 " intermediate level, indicating 

that the rTHY-ienv negative sequence element is not 
domxnant over the immunoglobulin part. This fusion 
protein was not or only slightly rev-responsive, 
20 xndxcating that only genes almost completely suppressed 
can be rev-responsive. 

Another characteristic feature that was found in 
the codon frequency tables is a striking 
underrepresentation of cpG triplets, m a comparative 
25 study Of codon usage in e. coli, yeast, drosophila and 
prxmates it was shown that in a high number of analyzed 
prxmate genes the 8 least used codons contain all codons 
wxth the CPG dinucleotide sequence. Avoidance of codons 
containing this dinucleotide motif was also found in the 
30 sequence of other retroviruses. it seems plausible that 
the reason for underrepresentation of CpG-bearing 
triplets has something to do with avoidance of gene 
silencing by methylation of cpG cytosines. The expected 
number of cpG dinucleotides for HIV as a whole is about 
35 one fifth that expected on the basis of the base 
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composition. This might indicate that the possibility of 
high expression is restored, and that the gene in fact 
has to be highly expressed at some point during viral 
pathogenesis. 

5 The results presented herein clearly indicate that 

codon preference has a severe effect on protein levels, 
and suggest that translational elongation is controlling 
mammalian gene expression. However, other factors may 
play ar role. First, abundance of not maximally loaded 

10 mRNA's in eukaryotic cells indicates that initiation is 
rate limiting for translation in at least some cases, 
since otherwise all transcripts would be completely 
covered by ribosomes. Furthermore, if ribosome stalling 
and subsequent mRNA degradation were the mechanism, 

15 suppression by rare codons could most likely not be 
reversed by any regulatory mechanism like the one 
presented herein. One possible explanation for the 
influence of both initiation and elongation on 
translational activity is that the rate of initiation, or 

20 access to ribosomes, is controlled in part by cues 

distributed throughout the RNA, such that the lentiviral 
codons predispose the RNA to accumulate in a pool of 
poorly initiated RNAs. However, this limitation need not 
be kinetic; for example, the choice of codons could 

25 influence the probability that a given translation 

product, once initiated, is properly completed. Under 
this mechanism, abundance of less favored codons would 
incur a significant cumulative probability of failure to 
complete the nascent polypeptide chain. The sequestered 

30 RNA would then be lent an improved rate of initiation by 
the action of rev. Since adenine residues are abundant 
in rev*responsive transcripts, it could be that RNA 
adenine methylation mediates this translational 
suppression. 
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Petal pr^^^^^^j.^^ 

da^cr^H?* following procedures were used in the above- 
described experiments. 
SeoucnrA ftnfllYiTin 

' bv th« if'^'T software" developed 

by the university of Wisconsin Computer Group. 

piasmid r^ynfftrntlinrf 

Plasmid constructions employed the following 
methods, vectors and insert DNA was. digested at a 
10 concentration of 0.5 Mg/10 m1 in the appropriate 

restriction buffer for l - 4 hours (total reaction volume 
approximately 30 mD . Digested vector was treated with 

or TJi °' ' '"'^^"'"^ ^^'^-^^"^ Phosphatase 

for 30 mxn prior to gel electrophoresis. Both vector and 
15 insert digests (5 to xo ^ each, were run on a 1.5% low 
melting agarose gel with TAB buffer. Gel slices 
containing bands of interest were transferred into a 1 5 
ml reaction tube, melted at 65-C and directly added to' 
the ligation without removal of the agarose. Ligations 
20 were typically done in a total volume of 25 m1 il ix Low 
Buffer IX Ligation Additions with 200-400 U of ligase 1 
Ml Of vector, and 4 „1 of insert. When necessary 5/ 
overhanging ends were filled by adding 1/10 volume of 250 
MM dNTPs and 2-5 U of Klenow polymerase to heat 
25 inactivated or phenol extracted digests and incubating 
for approximately 20 min at room temperature. When 
necessary, 3' overhanging ends were filled by adding 1/10 
volume Of 2.5 mM dNTPs and 5-10 u of T4 DNA polymerase to 
heat inactivated or phenol extracted digests, followed by 
30 incubation at 37-0 for 30 min. The following buffers 
were used in these reactions: lox Low buffer (60 mM Tris 
HCl, pH 7.5, 60 mM MgClj, 50 mM NaCl, 4 mg/ml BSA, 70 mM 
^-mercaptoethanol, 0.02% NaN3); lOx Medium buffer (60 mM 
Tris HCl, PH 7.5, 60 mM MgCl^, 50 mM NaCl, 4 ag/ml BSA 
35 70 mM ^-mercaptoethanpl, 0.02% MaN3) ; lOx High buffer Jeo 
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mM Tris HCl, pH 7.5, 60 mM MgCl2r 50 mM NaCl, 4 mg/nl 
BSA, 70 mM ^-mercaptoethanol, 0.02% NaN3) ; lOx Ligation 
additions (1 mM ATP, 20 mM DTT, 1 mg/ml BSA, 10 mM 
spermidine); 50x TAE (2 M Tris acetate, 50 mM EDTA) . 
5 Oligonucleotide synthesis and purification 

Oligonucleotides were produced on a Milligen 87S0 
synthesizer (Millipore) . The columns were eluted with l 
ml of 30% ammonium hydroxide, and the eluted 
oligonucleotides were deblocked at 55*^C for 6 to 12 

10 hours. After deblockiong, 150 fil of oligonucleotide were 
precipitated with lOx volume of unsaturated n-butanol in 
1.5 ml reaction tubes, followed by centrif ugation at 
15,000 rpm in a microfuge. The pellet was washed with 
70% ethanol and resuspended in 50 fil of H2O. The 

15 concentration was determined by measuring the optical 
density at 260 nm in a dilution of 1:333 (1 OD260 ^ 
Mg/ml) . 

The following oligonucleotides were used for 
construction of the synthetic gpl20 gene (all sequences 
20 shown in this text are in 5' to 3' direction), 

oligo 1 forward (Nhel) : cgc ggg eta gcc acc gag 
aag ctg (SEQ ID NO: 1) . 

oligo 1: acc gag aag ctg tgg gtg acc gtg tac tac 
ggc gtg ccc gtg tgg aag ag ag gcc acc acc acc ctg ttc tgc 
25 gcc age gac gcc aag gcg tac gac acc gag gtg cac aac gtg 
tgg gcc acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag 
gag gtg gag etc gtg aacgtg acc gag aac ttc aac atg (SEQ 
ID NO: 2) . 

oligo 1 reverse: cca cca tgt tgt tct tec aca tgt 
30 tga agt tct c (SEQ ID NO: 3), 

oligo 2 forward: gac ega gaa ctt caa cat gtg gaa 
gaa caa cat (SEQ ID NO: 4) 

oligo 2: tgg aag aac aac atg gtg gag cag atg cat 
gag gac ate ate age ctg tgg gac cag age ctg aag ccc tgc 
35 gtg aag ctg acc cc ctg tgc gtg acc tg aac tgc ace gac ctg 
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10 ate 



agg aac acc acc aac acc aac ac age aee gee aae . 
age aac age gag aoc »r,n ° 
no: 5). " atg (SEQ ID 

oligo 2 reverse fPstiw 
cat ete gee gee ett (SEQ^ xo ko: ^ 

Oligo 3 forward fPstl) : gaa gaa ctg caa ct-^ 
cat eae eae eag e (SEQ ID NO: 7) . ^ " 

oligo 3: aae ate aee ace aoo 

::: iir- - - c?;,rr;v.;:' 

•to ,ac «c „c age aee age tac cgc ctg ate J. T 
«c acc age gt, ate ace cag gee tge eec aag ate age tt'° 
gag eec ate eec ate eae ta^, ' 
(SEQ ID KO: 8) . ' «<= 9ee 

Oligo 3 reverse: gaa ett ett - 
15 ggc ggg (SEQ ID NO: 9) . ' ^''^ ^CC 

oligo 4 forward: gcg ccc eeo 
t,. .gt gca ae, aca aga .gt tc Tsrx;;^'" 

tg= aag ^rgtV .gHce « 

" ==, ^, gtg ::: g ::: 

... 9., g.g gtg gt, ate cge age gag' a. Z .T 

gee aag acc atr. «. ^ ^ac aae 

(SEQ 10 Ko! °" ca, ate 

" tg, .ctrcg;™%ro';o;rar - - - 

t,c ac, "eV'U r 'A, - 

aac cec°.V'° ^' "° "= t« "c ..9 c,e 

30 ::: ::: ::: ::: z ::: :n r "= - 

i-r-i. , gee eae tgc aac ate 

tet aga (SEQ id NO: 14) . 

oligo 5 reverse: gte gtt cca ett ggc tct aoa a.r 
gtt gca (SEQ ID NO: 15) . 

oligo 6 forward: gca aca tet eta o»« ^ 
35 aeg ac (SEQ ID NO: 16) . ^ ^ °" ^^a 
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oligo 6: gcc aag tgg aac gac acc ctg cgc cag ate 
gtg age aag ctg aag gag eag ttc aag aae aag acc ate gtg 
ttc ae eag age age gge gge gac ccc gag ate gtg atg eac 
age ttc aae tgc gge gge (SEQ ID NO: 17). 
5 oligo 6 reverse (EeoRl) : gca gta gaa ga"a ttc gee 

gee gca gtt ga (SEQ ID NO: 18) • 

oligo 7 forward (EcoRl) : tea act gcg gcg geg aat 
tct tct act gc (SEQ ID NO: 19) . 

oligo 7: gge gaa ttc ttc tac tgc aac acc age ccc 
10 ctg ttc aac age acc tgg aac gge aac aac acc tgg aac aac 
acc acc gge age aac aac aat att acc etc cag tgc aag ate 
aag cag ate ate aac atg tgg cag gag gtg gge aag gcc atg 
tac gee ccc ccc ate gag gge cag ate egg tgc age age (SEQ 
ID NO: 20) 

15 oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc 

ace gga tct gge cet e (SEQ ID NO: 21). 

oligo 8 forward: cga ggg eca gat ccg gtg cag cag 
caa cat eac egg tct g (SEQ ID NO: 22). 

oligo 8: aac ate acc ggt ctg ctg ctg ace cgc gac 
20 gge gge aag gac ace gac ace aae gac acc gaa ate ttc cgc 
ccc gge gge gge gac atg cgc gac aae tgg aga tct gag ctg 
tac aag tac aag gtg gtg aeg ate gag ccc ctg gge gtg gee 
ccc ace aag gee aag cgc cgc gtg gtg cag cgc gag aag cgc 
(SEQ ID NO: 23) • 
25 oligo 8 reverse (Notl) : cgc ggg egg ccg ett tag 

cgc ttc teg cgc tgc ace ac (SEQ ID NO: 24). 

The following oligonucleotides were used for the 
construction of the ratTHY-lenv gene. 

oligo 1 forward (BamHl/HindS) : cgc ggg gga tec 
30 aag ett acc atg att eca gta at a agt (SEQ ID NO: 25). 

oligo l: atg aat eca gta ata agt ata aca tta tta 
tta agt gta tta caa atg agt aga gga caa aga gta ata agt 
tta aca gca tct tta gta aat caa aat ttg aga tta gat tgt 
aga eat gaa aat aat aca aat ttg eca ata caa eat gaa ttt 
35 tea tta aeg (SEQ ID NO: 26). 
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oligo 1 reverse (EcoRi/Mlui) : cgc ggg aaa tt-. 
cgt taa tga aaa ttc atg ttg (SEQ ID NO: 27, ' ' 
Oligo 2 forward (BamHl/Mlui, : cgc gga tec aco 
gaa aaa aaa aaa cat (SEQ id NO: 28, . ' ""'"^ 

5 oligo 2: cgt gaa aaa aaa aaa cat gta tt» 

::: ::: a": ::: - agragVunar 

ttg ttt agt gat aga ttc ata aaa gta tta aca tta gca aat 
ttt aca aca aaa gat gaa gga gat tat atg tgt gag "^o ID 

" ... . '''^'^ ' (EcoRi/Sacl, : ' cgc gaa ttc gag etc 

aca cat ata ate tec (SEQ ID NO: 30,. 

aga ata att^' ' (BamHl/Sacl, : cgc gga tec gag etc 

aga gta agt gga caa (SEQ ID NO: 3i, . 

15 agt aat aaa aca ata aat gta ata aga gat aaa tta gta aaa 
tgt ga gga ata agt tta tta gta caa aat aca agt tgg tta 

ttt ata agt tta tga (SEQ ID NO: 32, . 

20 act t« ' (ECORI/Notl, : cgc gaa ttc gcg gcc 

20 get tea taa act tat aaa ate (SEQ ID NO: 33). 

Short, overlapping 15 to 25 mer oligonucleotides 
annealxng at both ends were used to amplify the long 
ougonuclotides by polymerase chain reaction (PCR, . 
25 Typ.cal PCR conditions were: 35 cycles. 55-0 annealing 
temperature, 0.2 sec extension time, pcr products were 
gel purified, phenol extracted, and used in a subsequent 
PCR to generate longer fragments consisting of two 
adjacent small fragments. These longer fragments were 
Cloned into a CDM7-derived plasmid containing a leader 
sequence of the CDS surface molecule followed by a 
Nhel/Pstl/Mlul/EcoRl/BamHl polylinJcer. 

The following solutions were used in these 
reactions: lOx PCR buffer (500 mM KCl, loo mM Tris HCl 
PH 7.5, 8 mM Mgcij, 2 mM each dNTP, . The final buffer' 



30 



35 
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was complemented with 10% OMSO to increase fidelity of 

the Taq polymerase. 

Small scale DNA pr eparation 

Transformed bacteria were grown in 3 ml LB 
5 cultures for more than 6 hours or overnight. 

Approximately 1.5 ml of each culture was poured into 1.5 
ml microfuge tubes, spun for 20 seconds to pellet cells 
and resuspended in 200 fil of solution I. Subsequently 
400 /il of solution II and 300 Ail of solution III were 

10 added. The microfuge tubes were capped, mixed and spun 
for > 30 sec. Supernatants were transferred into fresh 
tubes and phenol extracted once. DMA was precipitated by 
filling the tubes with isopropanol, mixing, and spinning 
in a microfuge for > 2 min. The pellets were rinsed in 

15 70 % ethanol and resuspended in 50 /xl dH20 containing 10 
^1 of RNAse A. The following media and solutions were 
used in these procedures: LB medium (1.0 % NaCl, 0.5% 
yeast extract, 1.0% trypton) ; solution I (10 mM EDTA pH 
8.0); solution II (0.2 M NaOH, 1.0% SDS) ; solution III 

20 (2.5 M KOAc, 2.5 M glacial aceatic acid); phenol (pH 
adjusted to 6.0, overlaid with TE) ; TE (10 mM Tris HCl, 
pH 7.5, 1 mM EDTA pH 8.0). 
T^aroe scale DNA preparation 

One liter cultures of transformed bacteria were 

25 grown 24 to 36 hours (MC1061p3 transformed with pCDM 
derivatives) or 12 to 16 hours (MC1061 transformed with 
pUC derivatives) at 37 •C in either M9 bacterial medium 
(pCDM derivatives) or LB (pUC derivatives) . Bacteria 
were spun down in 1 liter bottles using a Beckman J6 

30 centrifuge at 4,200 rpm for 20 min. The pellet was 

resuspended in 40 ml of solution I. Subsequently, 80 ml 
of solution II and 40 ml of solution III were added and 
the bottles were sha)cen semivigorously until lumps of 2 
to 3 mm size developed. The bottle was spun at 4,200 rpm 

35 for 5 min and the supernatant was poured through 
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and added to 4. s 9 of cesiun chloride 0 3 »l « 

5 ethldl™ bro.lde, am. 0.1 .1 of 1» ^Uon xio« , 

Th-» 4.t,K Triton XlOO solution. 

at 10,000 rpB for 5 .1„. The supernatant transferrin 
into B.cj».„ ouie. seal ultracentrlfug. tubes, 
xo ^^o":: r,'"-/" - ultr.centrlfu,e „si„T 

Z b.L " " J. 5 hours. 

The band was extracted by visible light using a 1 ml 

added to the extracted meterial. DKA was extracted once 

ethanol. mixed, and spun in a Be=>c.,n « centrifuge at 

ethanol and resuspended in 0.5 to 1 ml of H,o. The DK» 
concentration was determined by measuring t^e optical 
density at «o „. i„ . ,^ ^^^^^ ^ 

The following media and buffers were used in these 
2S procedures: H9 bacterial medium ,10 g H9 salts. 10 g 
casamlno acids (hydrolysed) , 10 ml M9 additions 7 5 
Mg/ml tetracycline (500 Ml of a 15 mg/.l .took solution) 
".5 M/.1 ampicilim ,125 m Of a 10 mg/ml stock 

30 Mg/ml thiamine, 70% glycerol); LB medium (i.o , «!« 0 5 
» yeast extract, 1.0 * trypton) , Solution I (10 mN E^a" 

(2.5 M KOAc 2.5 M HOAc) 
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g^quencina 

Synthetic genes were sequenced by the Sanger 
dideoxynucleotide method. In brief, 20 to 50 tig double-* 
stranded plasmld DNA were denatured In 0.5 H NaOH for 5 
5 nln. Subsequently the DNA was precipitated with 1/10 
volume of sodium acetate (pH 5.2) and 2 volumes of 
ethanol and centrifuged for 5 min. The pellet was washed 
with 70% ethanol and resuspended at a concentration of l 
fiq/fil. The annealing reaction was carried out with 4 /xg 

10 of template DNA and 40 ng of primer in ix annealing 
buffer in a final volume of 10 Ml* The reaction was 
heated to 65«C and slowly cooled to 37 •C. In a separate 
tube 1 Atl of C.l M DTT, 2 fxl of labeling mix, 0.75 ^1 of 
dH^O, 1 Ml of [^^S] dATP (10 uCi) , and 0.25 ^1 of 

15 Sequenase^ (12 U//il) were added for each reaction. Five 
/il of this ...iX were added to each annealed primer- 
template tube and incubated for 5 min at room 
temperature. For each labeling reaction 2.5 ^1 of each 
of the 4 termination mixes were added on a Terasaki plate . 

20 and prewarmed at 37 •C. At the end of the incubation 

period 3.5 ^1 of labeling reaction were added to each of 
the 4 termination mixes. After 5 min, 4 ftl of stop 
solution were added to each reaction and the Terasaki 
plate was incubated at 80«C for 10 min in an oven. The 

25 sequencing reactions were run on 5% denaturing 

polyacrylamide gel. An acrylamide solution was prepared 
by adding 200 ml of lOx TBE buffer and 957 ml of dHjO to 
100 g of acrylamide :bisacrylamide (29:1). 5% 
polyacrylamide 46% urea and ix TBE gel was prepared by 

30 combining 38 ml of acrylamide solution and 28 g urea. 

Polymerization was initiated by the addition of 4 00 $il of 
10% ammonium peroxodisulf ate and 60 ^1 of TEMED. Gels 
were poured using silanized glass plates and sharktooth 
combs and run in ix TBE buffer at 60 to 100 W for 2 to 4 
35 hours (depending on the region to be read) . Gels were 
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transferred to what»an blotting paper dried 

«bout 1 hour, and exposed to x-ray m» at ^'^ 

temperature Tvni,-,Ti ^ *^ 

F i:ure. Typically exposure tine was 12 h«. 

following solutions were used in i.h "* ^'^^ 

S Annealing buffer (200 ^ TrL ^ci p:\T^'"'-^^- 
250 .„ HaCl,; Labelling Mix (7 5 :„ 1 ^ ^ ^ 
dTTP,; Termination Mixes (so .M e hT 

- r^M^^Vborr^T.^^^^^^^^^^ * --ncyanol,;'.rTB"^ 

g Polyacryl.i^r3r 
TBE, 957 ml dHjO) . ' °° 

RNA iRola»^»p 

Cytoplasmic rna was isoi * 

15 Phosphate transfected a93T cells V. . ' 

*-T-a«.,* cells 36 hours post 

extracts were incubated at 37-0 for 20 min 
25 Phenol/chloroform extracted twice and • ■ 

.1 «::\"ra„;:— ^^^^^^^^ - - 

The following solutions were used in this 
30 procedure: Lysis Buffer rTE ^om-^* • 

an « »urrer (TE containing with 50 mM Tris dh 

8.0 100 mM NaCl, 5 mM MgCl,, o.5% NP40, ; Buffer I (Te 

e^n^'bUoro"'"^' ' " "^'^^ 
RNAse inhibitor, 0.1 v/fii RNAse free DNAse i) • sto« 

buffer (50 mM EDTA I.5 M NaOAc 1.0 % SDs" ' 
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Slot blot analysis 

For slot blot analysis 10 fig of cytoplasmic RNA 
was dissolved in 50 Ml dHjO to which 150 fil of lOx 
SSC/18% formaldehyde were added. The solubilized RNA was 
5 then incubated at 65 for 15 min and spotted onto with a 
slot blot apparatus. Radioactively labelled probes of 
1.5 kb gpl20IIIb and syngpl20mn fragments were used for 
hybridization. Each of the two fragments was random 
labelled in a 50 fil reaction with 10 ^1 ot 5x oligo- 
10 labelling buffer, 8 fil of 2.5 mg/ml BSA, 4 til of «["P]- 
dCTP (20 uCi/Ml; 6000 Ci/mmol) , and 5 U of Klenow 
fragment. After 1 to 3 hours incubation at 37»C 100 /il 
of TE were added and unincorporated ot[32pj^dcTP was 
eliminated using G50 spin column. Activity was measured 
15 in a Beckman beta-counter, and equal specific activities 
were used for hybridization. Membranes were pre- 
hybridized for 2 hours and hybridized for 12 to 24 hours 
at 42 •C with 0.5 X 10^ cpm probe per ml hybridization 
fluid. The membrane was washed twice (5 min) with 
20 washing buffer I at room temperature, for one hour in 
washing buffer II at 65»C, and then exposed to x-ray 
film. Similar results were obtained using a 1.1 kb 
Notl/Sfil fragment of pCDM7 containing the 3 untranslated 
region. Control hybridizations were done in parallel 
25 with a random- label led human beta-actin probe. RNA 
expression was quantitated by scanning the hybridized 
nitrocellulose membranes with a Magnetic Dynamics 
phosphor imager • 

The following solutions were used in this 

30 procedure: 

5x Oligo-labelling buffer (250 mM Tris HCl, pH 8.0, 25 mM 
MgClj, 5 mM ^-mercaptoethanol, 2 mM dATP, 2mM dGTP, 2mM 
dTTP, 1 M Hepes pH 6.6, 1 mg/ml hexanucleotides tdNTP]6); 
Hybridization Solution (_ M sodium phosphate, 250 mM 
35 NaCl, 7* SOS, 1 mM EDTA, 5% dextrane sulfate, 50% 
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35 



fon«a»ide, lOO ^g/ml denatured salmon sperm DNA); Washing 
buffer I (2x SSC, ^ 
0.1% SDS); washing buffer II (o.5x SSC, 0.1 % SDS) ; 20x 
SSC (3 M NaCl, 0.3 M Na3citrate, pH adjusted t*..7.0) 
5 Vaccinia reeomh^r^^^^^n 

vaccinia recombination used a modification of the 
of the method described by Romeo and Seed (Romeo and 
Seed, cell, 64: 1037, 1991). Briefly, CVI cells at 70 to 
90% confluency were infected with l to 3 m1 of a wildtype 
10 vaccinia stock WR (2 x 10^ pfu/ml) for i hour in culture 
medium without calf serum. After 24 hours, the cells 
were transfected by calcium phosphate with 25 /xg TKG 
Plasmid DNA per dish. After an additional 24 to 48 hours 
the cells were scraped off the plate, spun down, and 
15 resuspended in a volume of i ml. After 3 free2e/thaw 
cycles trypsin was added to 0.05 mg/ml and lysates were 
incubated for 20 min. A dilution series of lo, i and 0 i 
Ml Of this lysate was used to infect small dishes (6 cm) 
Of CVi cells, that had been pretreated with 12.5 Mg/»1 
nycophenolic acid, 0.25 mg/ml xanthin and 1.36 mg/ml 
hypoxanthine for 6 hours. Infected cells were cultured 
for 2 to 3 days, and subsequently stained with the 
monoclonal antibody NEA9301 against gpi20 and an alkaline 
phosphatase conjugated secondary antibody. Cells were 
incubated with 0.33 mg/ml NBT and 0.16 mg/ml BCIP in AP- 
buffer and finally overlaid with 1% agarose in PBS. 
Positive plaques were picked and resuspended in 100 nl 
Tris pH 9.0. The plaque purification was repeated once. 
To produce high titer stocks the infection was slowly 
scaled up. Finally, one large plate of Hela cells was 
infected with half of the virus of the previous round. 
Infected cells were detached in 3 ml of PBS, lysed with a 
Dounce homogenizer and cleared from larger debris by 
centrifugation. VPE-8 recombinant vaccinia stocks were 
kindly provided by the AIDS repository, Rockville, MD, 



20 



25 



30 
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and express HIV-i IIIB gpl20 under the 7.5 mixed 
early/late promoter (Earl et al., J* Virol,, 65:31, 
1991) . In all experiments with recombinant vaccina cells 
were infected at a multiplicity of infection of at least 
5 10. 

The following solution was used in this procedure: 
AP buffer (100 mM Tris HCl, pH 9.5, 100 mM NaCl, 5 mM 
MgCl2) 

Cell culture 

10 The monkey kidney carcinoma cell lines CVl and 

Cos?, the human kidney carcinoma cell line 293T, and the 
human cervix carcinoma cell line Hela were obtained from 
the American Tissue Typing Collection and were maintained 
in supplemented IMOM. They were kept on 10 cm tissue 

15 culture plates and typically split 1:5 to 1:20 every 3 to 
4 days. The following medium was used in this 

procedure : 

Supplemented IMDM (90% Iscove's modified Dulbecco Medium, 
10% calf serum, iron-complemented, heat inactivated 30 
20 min 56 ^C, 0.3 mg/ml L*glutamine, 25 M9/nl gentamycin 0.5 
mM ^-mercaptoethanol (pH adjusted with 5 M KaOH, 0.5 
ml)) . 

Transfection 

Calcium phosphate transfection of 293T cells was 

25 performed by slowly adding and under vortexing 10 fig 

plasmid ONA in 250 /tl 0.25 M CaCl2 to the same volume of 
2x HEBS buffer while vortexing. After incubation for 10 
to 30 min at room temperature the DNA precipitate was 
added to a small dish of 50 to 70% confluent cells. In 

30 cotrans feet ion experiments with rev, cells were 
transfected with 10 Mg gpl20IIIb, gpl20IIIbrre, 
syngpl20mnrre or rTHY-lenveglrre and 10 /ig of pCMVrev or 
CDM7 plasmid DNA. 
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The following solutions were used in this 
procedure: 2x HEBS buffer (280 bm NaCl, lo bM KCl 15^ 
sterile filtered,; 0.25 ^ CaCl, (autoclavedT 
Ilnlnunoprfrlnj^^A^^^n 

5 After 48 to 60 hours medium was exchanged and 

cells were incubated for additional 12 hours in Cys/Met- 
free medium containing 200 uci of ^Sg-translabel 
supernatants were harvested and spun for 15 min at 3000 
rpm to remove debris. After addition of protease 
10 inhibitors leupeptin, aprotinin and PMSF to 2.5 ^g/ml 50 
Mg/ml, 100 Mg/»1 respectively, 1 „i of supernatan! was 
incubated with either 10 ,1 of packed protein A sepharose 
alone (rTHV-ienvegirre, or with protein A sepharose and 3 
Mg Of a purified CD4/ immunoglobulin fusion protein 
15 (Kindly provided by Behring, (all gpi20 constructs, at 
4 C for 12 hours on a rotator. Subsequently the protein 
A beads were washed 5 times for 5 to 15 min each time. 

.IT\T. ""^^ " °' containing 

20 7% (all gpi20 constructs, or 10% (rTHY-ienvegirre, SDS 
polyacrylamide gels (TRIS pH 8.8 buffer in the resolving 
THIS PH 6.8 buffer in the stacking gel, TRIS-glycin 
running buffer, Maniatis et al. 1989,. Gels were fixed 
in 10% acetic acid and 10 % methanol, incubated with 

25 Amplify for 20 min, dried and exposed for 12 hours. 

^ The following buffers and solutions were used in 
this procedure: Wash buffer (lOO bM Tris, pH 7.5, 150 mM 
NaCl, 5 mM Cacij, 1% MP-40, ; 5x Running Buffer (125 mM 
Tris, 1.25 M Glycin, 0.5% SDS,; Loading buffer (10 % 
30 glycerol, 4% SDS, 4% ^-mercaptoethanol, 0.02 % bromphenol 
blue, . 

293T cells were transfected by calcium phosphate 
coprecipitation and analyzed for surface THY-l expression 
35 after 3 days. After detachment with 1 mM EDTA/PBS, cells 
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were stained with the monoclonal antibody 0X*7 In a 
dilution of 1:250 at A^C for 20 min, washed with PBS and 
subsequently incubated with a 1:500 dilution of a FITC* 
conjugated goat anti-mouse immunoglobulin antiserum, 
5 Cells were washed again, resuspended in 0.5 ml of a 
fixing solution, and analyzed on a EPICS XL 
cytof luorometer (Coulter) . 

The following solutions were used in this 
procedure : 

10 PBS (137 mM NaCl, 2.7 mM KCl, 4.3 mM NajHPO^, 1.4 mM 
KH2PO4, pH adjusted to 7.4); Fixing solution (2% 
formaldehyde in PBS) • 
ELI$A 

The concentration of gpl20 in culture supernatants 

15 was determined using CD4-coated ELISA plates and goat 
anti-gpl20 antisera in the soluble phase. Supernatants 
of 293T cells transfected by calcium phosphate were 
harvested after 4 days, spun at 3000 rpm for 10 min to 
remove debris and incubated for 12 hours at 4«C on the 

20 plates. After 6 washes with PBS 100 ^1 of goat anti- 

gpl20 antisera diluted 1:200 were added for 2 hours. The 
plates were washed again and incubated for 2 hours with a 
peroxidase-conjugated rabbit anti-goat IgG antiserum 
1:1000. Subsequently the plates were washed and 

25 incubated for 30 min with 100 m1 of substrate solution 
containing 2 mg/ml o-phenylenediamine in sodium citrate 
buffer. The reaction was finally stopped with 100 Ml of 
4 M sulfuric acid. Plates were read at 490 nm with a 
Coulter microplate reader. Purified recombinant 

30 gpl20IIIb was used as a control. The following buffers 
and solutions were used in this procedure: Wash buffer 
(0.1% NP40 in PBS); Substrate solution (2 mg/ml o- 
phenylenediamine in sodium citrate buffer) . 
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Use 

The synthetic genes of i-k- • 
for expressing the a orJT ^"vention are useful 

^ tne a protein normally exni-oc.^ • 
".anunalian cells in cell culture (^ J ^'^P"^^^^ 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION t 

(i) APPLICANT! SEED, BRIAN 

(11) TITLE or INVENTION: OVEREXPRESSION OF MAMMALIAN KND VIRAL 

PROTEINS 

(111) NUMBER OP SEQUENCES s 37 

(Iv) CORRESPONDENCE ADDRESS i 

(A) ADDRESSEES Plsh £ Richardson 

(B) STREET: 225 Franklin StrMt 

(C) CITY: Boa ton 

(D) STATE: Masaachuaetts 

(E) COUNTRY: U.S.A. 

(F) ZIP: 02110-2804 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatlbla 

<C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: PatentIn Release #1.0, version #1.308 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/308,286 
{B) FILING DATE: 19-SEP-1994 

(viil) ATTORNEY /AGENT INFORMATION: 

(A) NAME: CLARK, PAUL T 

(B) REGISTRATION NUMBER: 30,162 

(C) REFERENCE/DOCKET NUMBER: 00786/226001 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617) 542-5070 

(B) TELEFAX: (617) 542-8906 

(C) TELEX: 200154 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 baae pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l! 
CGCGGGCTAG CCACCGAGAA GCTG 
(2) INFORMATION FOR SEQ ID NO: 2: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 196 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NOi2: 
ACCCACAACC TCTCCCKAC CCTCTACTAC CCCCTOCCCC TGTCCAAGAO ACCCCACCAC 
CACCCTCTTC TCCGCCACCC ACCCCAACCC CTACCACACC CACCTGCACA ACCTCTCCCC 
CACCCACCCC TOCCTCCCCA CCCACCCCAA CCCCCACCAC CTCCACCTCG TCAACCTOAC 
CGACAACTTC AACATC 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SECUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: lir*<*ar 



60 
120 
180 
196 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCACCATCTT GTTCTTCCAC ATGTTGAACT TCTC 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GACCGACAAC TTCAACATGT GGAAGAACAA CAT 
(2) INFORMATION FOR SEQ ID NO:S: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCIRIPTION: SEQ ID N0:5: 

TCGAAGAACA ACATCGTCGA CCAGATGCAT CACCACATCA TCAGCCTCTC GGACCAGACC 60 

CTGAACCCCT GCGTGAAGCT GACCCCCTGT GCCTGACCTG AACTGCACCG ACCTGAGGAA 120 

CACCACCAAC ACCAACACAC CACCGCCAAC AACAACAGCA ACAGCGAGGG CACCATCAAG ISO 
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GGCGGCGAGA TG 

(2) INFORMATION FOR SEQ ID NOt6: 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH t 33 baa* pairs 

(B) TYPEt nucl«ic acid 

(C) STRANDEDNESS: singla 

(D) TOPOLOGY I linaar 



(xi) SEQUENCE DESCRIPTION i SEQ ID N0t6t 
CTTGAAGCTG CAGTTCTTCA TCTCGCCGCC CTT 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 baga pairs 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



GAAGAACTCC ACCTTCAACA TCACCACCAG C 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



31 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AACATCACCA CCAGCATCCG CGACAAGATG CAGAAGGAGT ACGCCCTGCT GTACAAGCTG 60 

CATATCGTCA GCATCGACAA CGACAGCACC AGCTACCCCC TGATCTCCTC CAACACCAGC 120 

GTGATCACCC AGGCCTGCCC CAAGATCAGC TTCGAGCCCA TCCCCATCCA CTACTGCGCC 180 

CCCGCCCGCT TCGCC 195 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPEt nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ 10 N0:9, 
CAACTTCTTC TCGCCCCOGA ACCCCCCGCO 
(2) INFORKATIOM FOR SEQ ID NO: 10, 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPEi nucl.ic acid 

(C) STRANDEDNBSS: •Inol* 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTIONS SEQ ID NO: 10. 
GCGCCCCCGC CCCCTTCGCC ATCCTCAAGT CCAACCACAA CAACTTC 
<2) INFORMATION FOR SEQ ID NO: 11: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: aingle 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:11: 
CCCGACAAGA AGTTCAGCGG CAAGGGCAGC TGCAAGAACG 'tGAGCACCCT GCAGTGCACC 
CACGGCATCC CCCCGCTGGT GACCACCCAG CTCCTGCTCA ACGGCACCCT OGCCGAGGAG 
CAGGTGGTGA TCCGCAGCGA GAACTTCACC GACAACCCCA AGACCATCAT CCTGCACCTC 
AATGACACCC TCCACATC 
(2) INFORMATION FX)R SEQ 10 NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 bmmm pair, 

(B) TYPE: nucl«ie acid 

(C) STRANDEDNESS: •ingl« 
<0) TOPOLOGY: linear 



60 
120 
180 
198 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGTTGGCACC CCTCCACTTC ATCTGCACCC TCTC 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 bas« pairs 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: •ingla 

(D) TOPOLOGY: linear. 
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(xi) SEQUENCE OESCRZPTZONi SEQ ZD NOa3l 
GAGACCGTGC AGATCAACTG CACGCGTCCC 30 
(2) ZNFORMATZON FOR SEQ ZD NO:14t 

(i) SEQUENCE CHARACTERZSTZCSs 

(A) LENGTHS 120 baM pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION! SEQ ZD N0sl4: 
AACTGCACGC GTCCCAACTA CAACAAGCCC AAGCCCATCC ACATCGGCCC CCGGCGCGCC 60 
TTCTACACCA CCAACAACAT CATCCGCACC ATCCTCCACG CCCACTGCAA CATCTCTACA 120 



(2) INFORMATION FOR SEQ ZD NO: 15s 

(i) SEQUENCE CKARACTERZSTZCS t 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRZPTZON: SEQ ZD NO: 15: 
GTCGTTCCAC TTCGCTCTAG AGATGTTGCA 30 
(2) INFORMATION FOR SEQ ZD NO:16t 

(1) SEQUENCE CKARACTERZSTZCS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRZPTZON: SEQ ZD NO: 16: 

GCAACATCTC TAGAGCCAAG TGGAACGAC 29 

(2) ZNFORMATZON FOR SEQ ZD N0il7: 

(i) SEQUENCE CHARACTERZSTZCSt 

(A) LENGTH: 131 base pairs 
(6) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 
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(xi) S8Q0EMCE DESCRIPTION: SEQ ID N0:17. 
CCCAACTCCA ACCACACCCT CCCCCACATC CTCACCAACC TCAA^.«.. 

AACACCAXCC «„CACCAC ACCACCCCCC CCCACC^l " 
ACTCCCCCCC C «:ACCCCCA CATCCTCATC CACAeCTTCA x^O 

(2) XNPORMATIOM FOR SEQ ID NO, 18, 
(i) SEQOEHCB CHARACTERISTICS, 

(B) TypB, nucl«ic «eid 

(C) STRANDEONESS: .inol. 

(D) TOPOLOGY, lii».ar 



(Xi) SEQUENCE DESCRIPTION: SEQ 10 NO, 18: 
CCAOTACAAC AATTCCCCGC COCACTTOA 
(2) INFORMATION FOR SEQ ID NO: 19, 

(i) SEQUENCE CHARACTERISTICS' 

(A) LENCTH. 29 b.w 

(B) TYPE, nucleic acid 

(C) STRANDEONESS, aingle 



29 



(D) TOPOLOGY, linear' 

(XI) SEQUENCE DESCRIPTION: SEQ 10 NO. 19, 
TCAACTCCGC CCCCGAATTC TTCTACTGC 

(2) INFORMATION FOR SEQ 10 NO, 20, ^ 

(i) SEQUENCE CHARACTERISTICS, 

A LENGTH, 195 base pair. 

(B) TYPE, nucleic acid 

(C) STRANDEONESS, •ingle 
(0) TOPOLOGY, linear ' 

(xi) SEQUENCE DESCRIPTION, SEQ 10 NO: 20- 
CCCGAATTCT TCTACTGCAA OVCCAGCCCC CTGTTCAACA CCACCTGCAA CGCCAACAAC eo 
ACCTGGAACA ACACCACCGG CAGCAACAAC AATATTACCC TCCAGTGCAA CATCAAG^^ j 

ATCOCCTGCA GCACC 

(2) INFORMATION FOR SEQ ID NO,21: 

(i) SEQUENCE CHARACTERISTICS,- 
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(A) LENGTH: 40 bAa« pairs 

(B) TYPEt nucleic acid 

(C) STRANDEONESSi ■in9l« 
(0) TOPOLOGY} linttar 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GCAGACCGGT CATGTTGCTG CTGCACCGGA TCTGCCCCTC 40 
(2) INFORMATION FOR SEQ ID NO: 22: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nuclsic acid 

(C) STRANDEDNESS: sin?!* 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
CGACGGCCAG ATCCCGTGCA CCAGCAACAT CACCCGTCTG 40 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

AACATCACCG GTCTGCTGCT CCTGCTCACC CGGACGGCGG CAACGACACC GACACCAACG 60 

ACACCCAAAT CTTCCCCGAC GGCGGCAACG ACACCAACGA CACCGAAATC TTCCCCCCCG 120 

GCCGCGGCCA CATGCGCCAC AACTGCAGAT CTGAGCTGTA CAAGTACAAG GTGCTGACGA 180 

TCGACCCCCT CGGCGTGGCC CCCACCAAGG CCAAGCGCGC GGTGGTGCAG CCCCAGAAGC 240 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQOBNCB DESCRIPTION, SEQ « No,24. 
CCCGOCCCCC COCTTTACCO CTTCTCGCCC TGCACCAC 

(2> INrORMATIOM FOR SEQ ID NO,2S, " 

(i) SEQUENCE CHARACTERISTICS, 
(A) LENGTH, 39 b«a« p,ir- 
B) TYPE, nucl.ic TciJ 
(C STRANDEONBSS, .injl. 

(O) TOPOLOGY, linear 

(xi) SEQUENCE DESCRIPTION, SEQ ID NO,aS. 
CCCGGGGGAT CCAACCTTAC CATGATTCCA GTAATAAGT 

(2) INFORMATION FOR SEQ ID NO, 26, " 
(i) SEQUENCE CHARACTERISTICS, 

(B) TYPE, nucl.ie .cid 

(C) STRANDEDHESS, .ingl, 
(0) TOPOLOGY, lin«.r ' 

(Xi) SEQUENCE DESCRIPTION, SEQ « NO,26. 
ATGAATCCAG TAATAAGTAT AACATTATTA TTAAGTGTAT TACAAATCAC 

ACAGTAATAA GTTTAACAGC AXCTTTAGTA AATCAAAA^ 1^^^^ " 
C^XAATACAAATTTGCCAATACAACATGAA^T"^^^^^^^^ - 
(2) INFORMATION FOR SEQ ID NO,27, 

(i) SEQUENCE CHARACTERISTICS, 
*> "^CTH, 36 b«. pal;. 
8) TYPE, nucl.ic acid 
C) STRANDEDNESS: Bingl, 
(0) TOPOLOGY, linear 



(Xi) SEQUENCE DESCRIPTION, SEQ ID NO,27, 
CCCGCGCAAT TCACCCCTTA ATCAAAATTC ATGTTG 
(2) INFORMATION FOR SEQ ID NO,28, 

(i) SEQUENCE CHARACTERISTICS, 

(A) LENGTH. 30 baa* pair. 

(B) TYPE, nucleic acid 

(C) STRANDEDNESS, einale 

(D) TOPOLOGY, linear 



36 
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(xi) SEQUENCE DESCRIPTION} SEQ ID NO: 28: 

CGCCGATCCA CCCGTGAAAA AAAAAAACAT 30 

(2) INFORMATION FOR SEQ ID NO t 29s 

(1) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 149 bas« pairs 

(B) TYPE: nucl«ic acid 

(C) STRANDEDNESS: singla 

(D) TOPOLOGY: linaar 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

CCTGAAAAAA AAAAACATCT ATTAACTGGA ACATTACGAG TACCAGAACA TACATATAGA 60 

AGTAGAGTAA TTTGTTTAGT CATAGATTCA TAAAAGTATT AACATTAGCA AATTTTACAA 120 

CAAAACATGA ACGAGATTAT ATGTGTGAG 149 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
<B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NOi30: 
CCCGAATTCC AGCTCACACA TATAATCTCC 30 
(2) INFORMATION FOR SEQ ID NO:31i 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CGCCGATCCG AGCTCAGAGT AAGTCGACAA 30 
(2) INFORMATION FOR SEQ ID NOt32t 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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SSQOENCB DESCRIPTION: SEQ 10 N0.32- 

^rr^r^ a.tc«.o«. .x..otxxxx x.cx.c^ x.c^oxxco ZZ27r 
TATXAxxx^ XXX.XCXXXX xx.c..ca. ccxxxxx^x ^.ro. 

(2) IMPORMATION FOR SEQ ID KOi33, "° 
(1> SEQOEHCE CHARACXERISXZCSi 

(B) xrPEi nucl«ic acid 

(C) SXRANOEOHESS: Bingl. 
(0) TOPOLOCY: lln.ar 



(Xi) SEQUENCE OESCRIPXION: SEQ ID NO:33, 
CCCCAAIXCC CGCCCCCXXC ATAAACXXAX AAAAXC 
(2) INPORMAXION FOR SEQ ID NOt34, 
(1) SEQUENCE CHARACTERISTICS: 

(B) TXPE: nucleic acid 

(C) STRANOEONESS: Sinai* 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34- 
CTCCACATCC ATTCTCCTCT AAACCACATA CCCOCCCACA 'cACCCTCACC TCCCC«CCC 
ACCTCCCCAC CCTOACCCAA CACAACGCCA GAAACCATCC CCATCCCCTC TCTCCAACCC 
-CCCACCT TCTACCTCCX COOC.TCCXC CTCCCXXCCC XCCXACCCAC CCACAACC. 
TOOCTCACCO TCTACTACCC CCXCCCCCTO XCCAAOCACC CCACCACCAC CCTCTTC^C 
CCCACCOACO CCAACCCCXA CCACACC«AC C«^CAACC XCTCCOCCAC CCACCCC«C 
CTCCCCACCC ACCCCAACCC CCACCACCT. CACCTCCTCA ACCTCACCCA CAACXXCAAC 
ATCXCCAACA ACAACATCCX CCACCAOATC CATCAGCACA TCAXCACCCT CTCCCACCAC 
ACCCTCAACC CCTCCCTCAA CCTCACCCCC CTCTCCCTCA CCC«,AACTC CACCCACC^ 
ACCAACACCA CCAACACCAA CAACACCACC CCCAACAACA ACAGCAACAO CGAGGGCACC 
AXCAAGGGCG GCGAGATCAA CAACTGCAGC TTCAACATCA CCACCAGCAT CCCCCACAAG 
AXGCAGAAGG AGTACGCCCT GCXGTACAAG CXGGAXAXC« XGAGCAXCGA CAACGACAGC 
ACCAGCTACC GCCTGATCTC CTGCAACACC AGCGTGATCA CCCAGGCCTG GCCCAAGATC 
AGCXXCGACC CCATCCCCAT CCACTACTGC GCCCCCGCCG GCTTCGCCAT CCTGAAGXGC 
AACCACAAGA AGXTCAGCGG CAAGGGCAGC I^CAAGAACG TGAGCACCGX GCAGTGCACC 



60 
120 
180 
240 
300 
360 
42C 
480 
540 
600 
660 
720 
780 
840 
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G6COGGTCGT 


GAGCACCCAG 




ACCGCACCCT 




7WW 




TCCGCAGCGA 


GAACTTCACC 


GACAACGCCA 


AGACCATCAT 




70U 


AATGAGAGCG 


TGCAGATCAA 


CTGCACGCGT 


CCCAACTACA 


ACAAGCCCAA 


CC6CATCCAC 




ATCGGCCCCG 


GGCGCGCCTT 


CTACACCACC 


AAGAACATCA 


TCGGCACCAT 


CCCeCACGCC 


^ wow 


CACTGCAACA 


TCTCTAGAGC 


CAAGTGGAAC 


GACACCCTGC 


GCCAGATCGT 


GAGCAAGCTG 


1140 


AAGGAGCAGT 


TCAAGAACAA 


GACCATCGTG 


rrCAACCAGA 


GCAGCGGCGG 


CGACCCCGAG 


1200 


ATCGTGATGC 


ACAGCTTCAA 


CTGCGGCGGC 


CAATTCTTCT 


ACTCCAACAC 


CAGCCCCCTG 


1260 


TTCAACAGCA 


CCTGGAACGG 


CAACAACACC 


TGGAACAACA 


CCACCCGCAC 


CAACAACAAT 


1320 


ATTACCCTCC 


AGTGCAAGAT 


CAAGCAGATC 


ATCAACATGT 


GGCAGGAGGT 


GGGCAAGGCC 


1380 


ATGTACGCCC 


CCCCCATCCA 


GGGCCAGATC 


CGGTGCAGCA 


GCAACATCAC 


CGGTCTGCTG 


1440 


CTGACCCCCC 


ACCGCGGCAA 


GGACACCGAC 


ACCAACCACA 


CCGAAATCTT 


CCGCCCCGOC 


1500 


GGCCGCGACA 


TGCGCGACAA 


CTCGAGATCT 


GAGCTGTACA 


AGTACAAGGT 


CGTGACGATC 


1560 


GAGCCCCTGG 


GCGTGGCCCC 


CACCAAGGCC 


AAGCGCCGCG 


TGGTGCAGCG 


CGAGAAGCGC 


1620 


TAAAGCGGCC 


GC 










1632 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS X 

(A) LENGTH: 2461 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: Single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3S: 



ACCGAGAAGC 


TGTGGCTGAC 


CGTGTACTAC 


GGCGTGCCCG 


TGTCGAAGGA 


GGCCACCACC 


60 


ACCCTGTTCT 


GCGCCAGCGA 


CGCCAAGGCG 


TACCACACCG 


AGCTGCACAA 


CCTCTCCCCC 


120 


ACCCAGGCGT 


GCGTGCCCAC 


CGACCCCAAC 


CCCCAGCAGG 


TCGAGCTCGT 


GAACGTGACC 


180 


GAGAACTTCA 


ACATGTGGAA 


GAACAACATG 


CTGGAGCAGA 


TGCATGAGGA 


CATCATCAGC 


240 


CTGTGGOACC 


AGAGCCTGAA 


GCCCTGCGTG 


AAGCTGACCC 


CCCTGTGCGT 


GACCCTGAAC 


300 


TGCACCCACC 


TGAGCAACAC 


CACCAACACC 


AACAACAGCA 


CCGCCAACAA 


CAACAGCAAC 


360 


AGCGAGGGCA 


CCATCAAGGG 


CGGCGAGATG 


AAGAACTGCA 


GCTTCAACAT 


CACCACCAGC 


420 


ATCCGCGACA 


AGATGCAGAA 


GGAGTACGCC 


CTGCTGTACA 


AGCTGGATAT 


CGTGAGCATC 


480 


CACAACGACA 


GCACCAGCTA 


CCGCCTGATC 


TCCTGCAACA 


CCAGCGTGAT 


CACCCAGGCC 


540 


TGCCCCAAGA 


TCACCTTCGA 


GCCCATCCCC 


ATCCACTACT 


GCGCCCCCGC 


CGGCTTCGCC 


600 


ATCCTGAAGT 


GCAACGACAA 


GAAGTTCAGC 


GGCAAGGGCA 


GCTGCAAGAA 


CGTGACCACC 


660 
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GTGCAGTCCA 
CTGGCCGAGG 
ATOGTGCACC 
AAGCCCATCC 
ATCOGCCAGC 
GTGACCAAGC 
GGCGACCCCC 
ACCAGCCCCC 
AGCAACAACA 
GTGGGCAAGG 
ACCGGTCTGC 
TTCCCCCCCC 
GTCGTCACCA 
CGCGAGAACC 
ACCATGCCGG 
GTGCAGCAGC 
ACCCTCTGGC 
CACCAGCAGC 
CCCTCCAACC 
ATCCAGTCCG 
AGCCAGACCC 
CTGTCGAACT 
CTCGCCGCCC 
CCCCAGGGCT 
CGCCCCGAGG 
CTCCACCCCT 
CACCACCCCC 
TGCCACGTCC 
ACCCCCGTCA 
ATOGAGCTCC 
CGGCTCCACA 



CCCACXWCAT 
AGGAGGTGGT 
TGAATCAGAO 
ACATCGGCCC 
CCCACTGCAA 
TGAAGGAGCA 
ACATCGTGAT 
TGTTCAACAG 
ATATTACCCT 
CCATGTACGC 
TCCTGACCCC 
GCCGCCCCGA 
TCCAGCCCCT 
CGCCCCCCAT 
CCGCCACCCT 
ACAACAACCT 
CCATCAACCA 
TCCTCGGCTT 
CCTCCTGCAC 
AGCGCCACAT 
AGCAGGAGAA 
CCTTCCACAT 
TGGTGGGCCT 
ACAGCCCCCT 
GCATCGAGGA 
TCCTGGCCAT 
ACCTGCTGCT 
TGAACTACTG 
GCCTGCTGAA 
TCCAGAGGGC 
CCCCGCTCCT 



CCCCCCCCTC 
GATCCGCAGC 
CGTGCAGATC 
CCCGCGCGCC 
CATCTCTAGA 
GTTCAAGAAC 
GCACAGCTTC 
GACCTGGAAC 
CCAGTGCAAG 
CCCCCCCATC 
CGACGGCGGC 
CA7GCGCGAC 
GGCCGTGGCC 
CCGCCCCCTG 
CACCCTGACC 
CCTCCGCGCC 
GCTCCAGGCC 
CTGCCCCTCC 
CAACAAGAGC 
CGATAACTAC 
CAACGAGCAC 
CACCAACTCC 
CCGCATCGTC 
CAGCCTCCAG 
CGACGGCGGC 
CATCTCGCTC 
GATCCCCGCC 
CTGCAACCTC 
CGCCACCGCC 
CGGCACGGCG 
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CTGAGCACCC 
CACAACTTCA 
AACTGCACCC 
TTCTACACCA 
GCCAACTCGA 
AAGACCATCC 
AACTCCCCCG 
GCCAACAACA 
ATCAACCAGA 
GAGGGCCAGA 
AACGACACCG 
AACTCGAGAT 
CCCACCAACC 
TTCCTCGGCT 
GTGCAGGCCC 
ATCGAGGCCC 
CCCCTCCTGC 
TCCCGCAACC 
CTCGACCACA 
ACCAGCCTCA 
GAGCTCCTCG 
CTCTCCTACA 
TTCGCCGTCC 
ACCCGGCCCC 
CACCGCGACC 
GACCTCCCCA 
CGCATCGTCG 
CTCCAGTATT 
ATCGCCGTGG 
ATCCTCCACA 



ACCTCCTCCT CAACCCCACC 720 
CCCACAACGC CAAGACCATC 780 
CTCCCAACTA CAACAAGCCC 840 
CCAAGAACAT CATCCCCACC 900 
ACGACACCCT CCCCCACATC 960 
TCTTCAACCA CAGCACCCGC 1020 
CCCAATTCTT CTACTCCAAC 1080 
CCTCCAACAA CACCACCCCC 1140 
TCATCAACAT GTCCCAGGAG 1200 
TCCGCTGCAC CAGCAACATC 1260 
ACACCAACGA CACCGAAATC 1320 
CTGAGCTCTA CAACTACAAC 1380 
CCAAGCGCCC CCTGGTGCAG 1440 
TCCTGGCGGC GGCGGGCAGC 1500 
CCCTCCTCCT CACCCCCATC 1 560 
ACCACCATAT GCTCCAGCTC 1620 
CCGTCCACCC CTACCTGAAC 1680 
TGATCTGCAC CACCACCGTA 1740 
TCTCGAACAA CATCACCTXK; 1800 
TCTACAGCCT GCTGGAGAAC I860 
AGCTGGACAA CTCGGCGAGC 1920 
TCAAAATCTT CATCATGATT 1980 
TGAGCATCCT GAACCCCGTG 2040 
CCGTGCCGCG CGGGCCCCAC 2100 
CCGACACCAG CGGCAGGCTC 2160 
CCCTGTTCCT GTTCAGCTAC 2220 
AACTCCTACG CCXJCCCCCGC 2280 
GGAGCCAGGA GCTCAAGTCC 2340 
CCGAGGGCAC CCACCGCCTC 2400 
TCCCCACCCG CATCCCCCAG 2460 

2481 
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(2) INrORMATION FOR SEQ ID NOs36: 

(1) SEQUENCE CHA2UVCTERIST2CS: 

(A) LENGTH: 486 base paire 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: flingl* 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

ATGAATCCAG TAATAAGTAT AACATTATTA TTAAGTGTAT TACAAATGAG TAGAGGACAA 60 

AGAGTAATAA GTTTAACACC ATGTTTAGTA AATCAAAATT TGAGATTAGA TTGTAGACAT 120 

GAAAATAATA CACCTTTGCC AATACAACAT GAATTTTCAT TAACGCGTGA AAAAAAAAAA 160 

CATGTATTAA GTGGAACATT AGGAGTACCA GAACATACAT ATAGAAGTAG AGTAAATTTG 240 

TTTAGTGATA GATTCATAAA AGTATTAACA TTAGCAAATT TTACAACAAA AGATGAAGGA 300 

GATTATATGT GTGAGCTCAG AGTAAGTGGA CAAAATCCAA CAAGTAGTAA TAAAACAATA 360 

AATGTAATAA GAGATAAATT AGTAAAATGT GGAGGAATAA GTTTATTAGT ACAAAATACA 420 

AGTTGGTTAT TATTATTATT ATTAAGTTTA AGTTTTTTAC AAGCAACAGA TTTTATAAGT 480 

TTATGA 486 
(2) INFORMATION FOR SEQ ID N0:37t 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 baaa pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



ATGAACCCAG 


TCATCAGCAT 


CACTCTCCTG 


CTTTCAGTCT 


TGCAGATGTC 


CCGAGGACAG 


60 


AGGGTGATCA 


CCCTGACAGC 


CTGCCTGGTG 


AACAGAACCT 


TCGACTGGAC 


TGCCGTCATG 


120 


AGAATAACAC 


CAACTTGCCC 


ATCCAGCATG 


AGTTCAGCCT 


GACCCGAGAG 


AAGAAGAACC 


180 


ACGTGCTGTC 


AGGCACCCTG 


GGGGTTCCCG 


AGCACACTTA 


CCGCTCCCGC 


GTCAACCTTT 


240 


TCAGTGACCO 


CTTTATCAAG 


GTCCTTACTC 


TAGCCAACTT 


GACCACCAAG 


GATGAGGGCG 


300 


ACTACATGTG 


TGAACTTCGA 


GTCTCGGGCC 


AGAATCCCAC 


AAGCTCCAAT 


AAAACTATCA 


360 


ATGTGATCAG 


AGACAAGCTG 


GTCAAGTGTG 


GTCGCATAAG 


CCTGCTGGTT 


CAAAACACTT 


420 


CCTGGCTGCT 


GCTGCTCCTG 


CTTTCCCTCT 


CCTTCCTCCA 


AGCCACGGAC 


TTCATTTCTC 


480 


TGTGA 












485 



What is claimed is: 
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10 



15 



20 



25 



30 



1. A synthetic gene encoding a protein no«..ii 
expressed in naBunalian cells wherein «t i t "°"*"y 
preferred or less preferred codon in Z. Zl7 

preferred codon encoding the sa.e an,ino acid. 

svnthetlV ^""^ of clai. i wherein said 

synthetic gene as capable of expressing said »a»„alian 
protein at a level which is at least iio% of that 
expressed by said natural gene in an m ^ .a^alian 
cell culture syste» under identical conditions. 

synthetll' ^""^ .^^^^^^^^^^ ^^ne of claim l wherein said 

Prlin at'": °' ^^""^"^ "kalian 

protein at a level which is at least 150% of that 

expressed by said natural gene in an ^ cell 

culture system under identical conditions 



4. 



synthetic °' ' 

synthetic gene xs capable of expressing said manunalian 

protein at a level which is at least 200% of that 

expressed by said natural gene in an in ^OtTfl cell 

culture system under identical conditions. 

svn^h J' "^""^^^^^'^ 1 Wherein said 

synthetic gene is capable of expressing said mammalian 
protein at a level which is at least 500% of that 
expressed by said natural gene in an in Yito cell 
culture system under identical conditions. 

svni.h ""^"^ synthetic gene of claim 1 wherein said 

synthetic gene is capable of expressing said mammalian 
protein at a level which is at least ten times that 
expressed by said natural gene in an in vii£o cell 
culture system under identical conditions. 
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7* The synthetic gene of clain 1 wherein at least 
10% of the codons in said natural gene are non-preferred 
codons • 

8« The synthetic gene of claim l wherein at least 
5 50% of the codons in said natural gene are non-preferred 
codons . 



9. The synthetic gene of claim 1 wherein at least 
50% of the non-preferred codons and less preferred codons 
present in said natural gene have been replaced by 

10 preferred codons. 

10. The synthetic gene of claim 1 wherein at 
least 90% of the non-preferred codons and less preferred 
codons present in said natural gene have been replaced by 
preferred codons. 

15 11. The synthetic gene of claim 1 wherein said 

protein is a retroviral or lentiviral protein. 

12. The synthetic gene of claim 11 wherein said 
protein is an HIV protein. 

13. The synthetic gene of claim 12 wherein said 
20 protein is selected from the group consisting of gag, 

pol, and en v. 

14. The synthetic gene of claim 13 wherein said 
protein is gpl20 or gpl60. 

15. The synthetic gene of claim 1 wherein said 
25 protein is a human protein. 
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16. A method for preparing a synthetic gene 
encoding a protein normally expressed by mammalian cells, 
comprising identifying non-preferred and less-preferred 
codons in the natural gene encoding said protein and 
5 replacing one or more of said non-preferred and less- 
preferred codons with a preferred codon encoding the same 
amino acid as the replaced codon. 



1 
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: CrCSACATCC ATTCTSCTCT AAACCAGA7A CC=5CCCACA CACCCTCACC 
£1 T3C0GTSCCC AOCTCCCCAG GCTGACGCAA GAGAACGCCA GAAACCATGC 

cgatcgggtc t.-tgcaaccg ctsgccacct tgtacctcct ggogatcctg 
3tc3cttcc3 tsctacccac csasaagcts tgggtgaccg tgtactacgg 

201 C3TGCC33TG TXJAAGGAGG CCACCACCAC CrtGTTrrSC 3CCAGCGACG 
2 SI C:AACGCGTA CSACACCGAG GT3CACAAC3 T370GCCCAC CCACGCGTGC 
2:: GTGCCCACCS ArCCCAACC: CCAGGACGT3 GAGCTC3T3A ACGTGACC3A 
2 51 3AACTTCAAC ArGTSGAACA ACAACATG3T 33AGCACA7G CATGACCACA 
4:1 TCATCA3CCT CiTGGGACCAS AGCCTSAAGC CCTCCSTGAA CCTGACCCCC 
4 51 3T3TCC3TGA C 3~3AACT3 CAC33ACCTC AGGAACACCA CCAACACCAA 
5:1 :AACAGCACC (^CCAACAACA ACAGCAACAG C3AGGGCACC ATCAAGGGCG 
SSI 3CGAGA7GAA CAAC7GCAGC 77CAACA7CA CCACCAGCA7 CC3CGACAAG 
601 A7CCA0AACC. A37AC3CCC7 GC7G7ACAAG C73GATA7C3 7GAGCA7CGA 
=£1 CAACGACAGC A3C:AGC7ACC GCC7GA7C7C C7GCAACACC AGCC7GA7CA 
":i C3CAGCCC7S QCCCAACATC AGCT7CGACC C3A7CCCCAT CCAC7ACTGC 
-S: 3C3CC33CC5 9r:77C3CCA7 CC7GAAC73C AACGACAAGA ACTTCAGCGG 
30: CAAGGGCAGC T3CAAGAACG 7GAGCACCG7 GCAC7GCACC C:ACGGCA7CC 
2 51 3GCC3G7GG7 qAGCACCGAG C7SC7GC70A ACGGCAGCC7 3GCCGAGGAG 
9:: 3AGGT6G7GA TCCOCAGCGA CAAC77CACC 3ACAAC3CCA AGACCA7CA7 
951 C37GCACC7G AA7GAGAGCC 73CAGATCAA C7GCACGCG7 CCCAACTACA 
10 CI ACAAGCGCAA aCGCA7C3AC A7C5GCCCC3 5GCGC3CC77 C7ACACCACC 
1:51 AAGAACA7CA TCGGCACCAT CC3CCA00CC CACTOCAACA 7CTC7AGAGC 
1101 CAACTCGAAC CACACCC73C GCCAGA7C37 3AGCAACC73 AACCAGCACT 
1151 7CAAGAACAA CACCATCG7S 77CAACCAGA GCAGCGGCGO CGACCCCGAC 
1:01 A7CG7GA7GC ACAGC77CAA C70CGCCGGC GAAT7C77C7 AC7GCAACAC 
12 SI CAGCCC3C7G TTCAACAGCA CC7CGAAC3G CAACAACACC 73GAACAACA 
1301 CCACC3GCAC CAACAACAAT A7TACCC7CC AGTCCAAGAT CAASCACATC 
13 SI A7CAACA7G7 CGCAGGAGG7 GGGCAAGGCC A7G7ACGCCC CCCCCATCCA 
KOI GCGCCAGATC CGG7GCACCA CCAACATCAC C3G7C7CC7G C7CACCCCCG /^'^ ' 



"^096^319 PCrAJS95/11511 

2/12 

-501 3GCGGCSACA TSCSCSACAA CTCGACA-C- ■ , 

— «OA.CT -««w.5TACA AGTACAACCT 

CCTCACSATC i\CCCr~3G 

-►«C .-...c=C= CAC=AAC5CC AACC5CCGC5 

-s21 tSCTCCACCC CiACAAGCSC TAAAa / *r^^ 
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I a;;c:;acaacc -tccstsac cctstactac Gccsrr.-rcs tstsgaagca 

=: ccccAccAc: AccrTcrrrr crcccAccG* rnccAAccscs tacsacaccg 

::: ajctccacaa c:t"ccccc AcrrAcc;cr:T 3C3T3crcAc ccaccccaac"" 

.5. JCCCACCACC r:CA5CTr2T :;AAccT.-;Arc cagaacttca acatstscaa 

::: ^aacaacatc ctscaccaga zTr.x7r.x0zx ca?catcagc cts-ogcac- 

:s: A3ACCCTCAA ^:r--r:.-r-3 aacctsaccc rcrrcrscsT caccctcaac 

::i TscAcr^Ar- t-agcaacac :accaacacc aacaacacca c::ccv.caa 

:AACAr.CAAC .\ZZZAZOOZ.\ CCATCAACGC CCCCCACATO AAwAACTCCA 

-rrrzAACAT c^ccaccacc atccccgaca acatccag/^x gga<;-;acccg 
-r:;cTGTACA agctggatat cgtgaccatc :acaacga;:a ,;caccagcta 

551 t:a3cttc:a ^zzzx-zzz atcvacvact GCGrrrcrcr CGGCTTCGcr 

ATCCT^AAC? qCAAC:;w.V. JAAGTrCACC GGCAAGGGCA GCTGCAAGAA 

•:=■ ::rcAGCArr :tc::ag.gca rccACGGCAT ccggccgctg GTGAccAcrr 
ArcTcrrGCT cjaaw-gcagc rrcGcrcAGG aggagcts-,t jarrr-ncAcc 
w.-.c.^^cr:,A ccgacaacgc caagaccatc atcgtgcacc tcaatgacag 

30 : JClGCAGATt AACTGCACGC GTCCCAACTA CAArAACCGC AAGC3CATCC 
ill XZX7ZZZZZZ CZZZZZZZZZ rrCTACACra CZAAGAACAT CATCGGCACr 
i:: ATCCG-AGG crrACTGCAA CArrrCTACA GCCAAGTGGA AGGACACCr: 

:-5: ;rGcrAGATr c^tgaccaa.-^-: tzaaggagca GrrcAAGAAC aagagcatcg 
:::: tg—gaac-a gav-cagcggc sgcgagcggg agatcgtgat ggacaccttc 
:::: AAcr-c-rG (^cgaattgtt gtagtgcaac aggagcgggg tg— aacag 

-AGrrGGAAC (SGCAACAACA CGTGGAACAA CAGG.\CrJGC AGGAACAACA 

:::: ATArrAGcrr cgagtggaag atgaaggaga catgaagat gtgggaggag 

GTGGGCAAGG CGA7GTAGGG GGGGGGGATG ^AGGGGGAGA TGGGGTGGAC 
:2s: GAGGAACATG ACGGGTrTGC "GTGAGGGG GGAGGGGGGG AAC.-.ACACGG 
:::: AGAGG.V/,GA CACGGAAATG TTGGGGGrrG GGGGC"rr:A GATGGGGGAC 
-1:1 .^ACTGGAGAT CTGAGGTGTA rAAGTACAA.-, r.TGGTCAGGA TCGAGCGGGT 

:ggggtgggg cggacgaacg -GAinrnGGG ggtggtggag ggggagaacg 
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USl 3CSCC3CCAT CiOCSCCCTa TTCCTSCCCT TCCTGSGCCC OGCS^CACC 

:501 ACCATOGCCG C,-3CCACC3T CACCCTCACC 3T5CACGCCC SCCTSCTCCT 

:S51 -ACCSOCATC ^TOCAGCAGC AGAACAACCT CrTCCSCGCC A7CSAGGCCC 

IsOl AGCAGCATAT Q^rTCCAGCTr ACC375TCGG GCATCAACCA CCTCCAGGCC 

C3C3TCC73G c:=T3gacc;g ;tacct3aag gaccagcagc TcrrsGGcr: 
rrcGGCcrsc rrcGSCAAoc tsatctgcac caccacggta ccctggaacg 

l-Sl CrrCCTGGAC C^ACAAGAGC CT3GACGACA TCTGGAACAA CAT3ACCT35 

Is:: ATSCAGTSGG A.:CSCGAGA? CGATAACTAC ACCAGCCT3A TCTACACCr: 

liSi 3CTGGAGAAG A^CCAGACCC AGCAGGACAA GAAC3AGCAG GAGCT3CTGG 

iSCl A3CTS3ACAA C:3GGC3A3C CTSTSGAACT 3GTTC3ACAT CACCAACT3G 

is;: CTGTSGTACA rc.^AAATCTT rATCATGATT C73GGC3GCC 73G7GGGCC7 

:::: :C3CA7C3r3 T773C=37:: T:AGCA7737 :AACC3C37G C3CCAGGGC7 

::«: ACAGCr=CC7 GAGCC7C=AG ACCCGGCCCC C3373CC3C3 C3GGCCC3AC 

:::: rSCrtCSAGG CCA7=3AGGA 3GAGGGCGGC 3AGC3C3ACC 3C3ACACCAG 

::=: r3GCAGGC7= q7GCAC3GC7 7:C73GCGA7 CA7C7GGG7C 3ACC7CCGCA 

:::: 3CC73777C7 ^773A3C7AC 3ACCAC33C3 ACr:3C7GC7 3A7C3CC3CC 

225: C3CA7C37GC AAC7CC7ACG C33CC3CGGC 733GACG73C 7GAAG7AC73 

:3Ci C73GAACC7C C73CAG7A77 GGAGCCAGGA 3C7GAAG7CC A3CGCC37GA 

:•£: 3C::73G7GAA C333AC33C: A7C3CCG73G CrSAGGGCAC 33ACCGC373 



..CAGACGSC =3GGAGGGCG A7rC73CACA 7CCCCAC 
:;■: :A7:C3CC.\G :;GGC7C3AGA 3GGC3C73C7 3 '3<f<^ I? WOJi*') 
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FIGURE 5 
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